Signal Encoding and Digital Signal Processing in Continuous Time by Kurchuk, Mariya
Signal Encoding and Digital Signal
Processing in Continuous Time
Mariya Kurchuk
Submitted in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy







Signal Encoding and Digital Signal Processing
in Continuous Time
Mariya Kurchuk
This work investigates signal encoding in, and architectures of, digital signal processing sys-
tems that function in continuous time (CT). Unlike conventional digital signal processors (DSPs),
which rely on a clock to dictate the sampling times of an analog-to-digital converter (ADC) and to
provide the tap delay timing, CT DSPs function entirely in continuous time, without a sampling or
a synchronizing clock. The samples of a CT DSP system are generated and processed only when
some measure of the input signal crosses a predetermined threshold. The effective sampling rate
and the dynamic power dissipation of a CT digital system automatically adapt to the activity of the
input signal. The properties of signals sampled in continuous time are investigated in this thesis.
A technique for reducing the effective sampling rate of a CT system is presented, in which the
digital signal encoding is varied by adjusting the resolution according to a property of the input. A
variable-resolution system leads to a decrease in the number of samples generated, a reduction in
the power dissipation and a reduction in the effective chip area of a CT DSP, all without sacrificing
in-band performance. The properties of several asynchronous signal-driven sampling techniques
are analyzed and compared.
The architecture and signal encoding of CT DSPs for signals in the lower gigahertz frequency
range are investigated, with consideration of speed and accuracy limitations in the context of sub-
micron CMOS technologies. A per-edge digital signal encoding technique is developed, which
bypasses timing problems of processing high-speed digital signals; the properties of per-edge
encoded signals are discussed. The design considerations of a low-resolution per-edge-encoded
gigahertz-range CT DSP are discussed and an implementation for a possible application is de-
tailed. A prototype chip has been fabricated in ST 65 nm CMOS technology, which has a compact
processor core area of 0.073 mm2. The implemented CT digital processor achieves SNDR of over
20 dB with 3 bits of resolution and a maximum usable -3dB bandwidth of 0.8 GHz to 3.2 GHz.
The processor can be configured as a one-tap to six-tap CT FIR filter and has an active power
dissipation that varies from 0.27 mW to 9.5 mW, depending on the amplitude and frequency of the
input signal.
Contents
List of Figures iv
List of Tables xx
1 Introduction 1
1.1 Comparison of CT ADC/DSP/DAC and DT ADC/DSP/DAC . . . . . . . . . . . . 4
1.2 Timing granularity of CT digital processing . . . . . . . . . . . . . . . . . . . . . 8
1.3 Properties of signals quantized in continuous-time . . . . . . . . . . . . . . . . . . 11
1.3.1 Variations in in-band error with changes in input frequency and amplitude . 14
1.4 Comparison of signals sampled in continuous time, finely quantized time and dis-
crete time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Variable-Resolution Quantization 32
2.1 Potential variable-resolution quantizer schemes . . . . . . . . . . . . . . . . . . . 33
2.2 Proposed variable-resolution quantizer . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.1 Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.2 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.3 Spectral properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
i
2.2.4 Setting the slope threshold . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3 Variable-resolution ADC in the context of a CT DSP system . . . . . . . . . . . . 41
2.3.1 Variable-resolution CT DSP . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.3.2 Power reduction of a variable-resolution CT DSP . . . . . . . . . . . . . . 44
2.3.3 Token rate estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4.1 Properties of the in-band SNDR . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.2 Single-tone input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4.3 Two-tone test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.4.4 Voice signal example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.4.5 CT DSP example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3 Comparison of Signal-Driven Sampling Techniques 56
3.1 Signal-driven sampling schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.1.1 Magnitude of error sampling criterion: Differences in CT quantization
techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1.2 Sampling criterion based on the integral of absolute error . . . . . . . . . . 61
3.1.3 Sampling rate reduction techniques . . . . . . . . . . . . . . . . . . . . . 71
3.2 Comparison of sampling schemes using ZOH and higher-order reconstruction . . . 79
4 Digital Signal Processing of Wideband Gigahertz Signals 85
4.1 Problems with current processing approaches and alternative solutions . . . . . . . 88
4.2 Per-edge signal encoding for gigahertz signal processing . . . . . . . . . . . . . . 94
4.2.1 Non-idealities of per-edge encoding: The generation of half-harmonics . . 98
ii
4.2.2 Effects of delay mismatch on the distortion . . . . . . . . . . . . . . . . . 115
4.2.3 Effects of delay jitter on the noise performance of the system . . . . . . . . 119
4.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5 Gigahertz-range CT DSP implementation 121
5.1 Gigahertz CT digital FIR filter architecture . . . . . . . . . . . . . . . . . . . . . . 129
5.2 Continuous-time delay-cell design . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.2.1 Delay-cell topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.2.2 Energy-efficient CT digital delay cell . . . . . . . . . . . . . . . . . . . . 142
5.2.3 Timing jitter of a delay cell . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.3 Charge-pump design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.3.1 Pulse generator design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.3.2 Bidirectional voltage-controlled current-source design . . . . . . . . . . . 190
5.3.3 Simulation results of the bi-directional charge pump . . . . . . . . . . . . 194
5.4 Variable adder-capacitor design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
5.5 DC-control-block design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
5.6 CT gigahertz digital processor implementation . . . . . . . . . . . . . . . . . . . . 204
5.6.1 Speed and resolution limitations of a gigahertz CT digital processing system204
5.6.2 Simulated CT digital processor outputs . . . . . . . . . . . . . . . . . . . 208
5.7 Filter tuning techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
5.7.1 Delay-cell tuning scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
5.7.2 Filter-coefficient tuning scheme . . . . . . . . . . . . . . . . . . . . . . . 216
6 Measurement Results 219
iii
6.1 Measurement setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
6.2 System Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
6.2.1 Transfer characteristic of the CT ADC . . . . . . . . . . . . . . . . . . . . 224
6.2.2 Feedthrough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
6.2.3 Effect of feedthrough on the measured frequency response of a one-tap CT
processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
6.2.4 High-frequency roll-off of the CT processor and the output buffer . . . . . 233
6.2.5 Deducing the gain of the input driver from feedthrough measurements . . . 235
6.3 Parasitic-coupling-induced increase in half-harmonics at the 0th filter tap . . . . . . 237
6.4 Delay cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
6.5 Frequency responses of the gigahertz digital FIR filter . . . . . . . . . . . . . . . . 243
6.6 Spectra of processed signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
6.7 Signal-dependent power dissipation . . . . . . . . . . . . . . . . . . . . . . . . . 256
6.8 Performance summary and a comparison to other work . . . . . . . . . . . . . . . 260
7 Conclusions and Suggestions for Future Work 263
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
7.2 Suggestions for future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
7.2.1 First-order DAC reconstruction for low- to moderate-speed CT applications 264
7.2.2 Improvement of gigahertz-range CT DSP . . . . . . . . . . . . . . . . . . 267
iv
List of Figures
1.1 Clockless level-crossing sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Comparison of a continuous-time and discrete-time ADC: (a) block diagram, (b)
input signal, quantization thresholds (dashed), and reconstructed quantized input,
and (c) example digital representation. . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 CT Delta-encoded ADC block diagram and example signals. . . . . . . . . . . . . 6
1.4 Comparison of a CT and a DT DSP: (a) block diagram, (b) frequency response, (c)
input signal spectrum, (d) reconstructed DSP output spectrum. . . . . . . . . . . . 7
1.5 Granularity time between consecutive quantization-level crossings. . . . . . . . . . 8
1.6 CT tap delay comprised of granular delay cells. . . . . . . . . . . . . . . . . . . . 9
1.7 (a) Quantizer, and (b) equivalent quantizer comprised of a CT ADC and an ideal
DAC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.8 Example waveforms of a three-bit CT ADC. Time-domain waveforms, normalized
to the input period, of (a) the input sinusoid, (b) the quantization error, (c) the quan-
tizer output, and their corresponding spectra, normalized to the input frequency, in
(d), (e), and (f), respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
v
1.9 (a) Quantization error for a full-scale input sinusoid quantized with three-bit reso-
lution, and (b) spectrum of quantization error for a sinusoidal input quantized with
six-bit resolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.10 (a) Spectrum of a quantized sinusoid, (b) frequency representation of an impulse
train corresponding to ideal time sampling, (c) spectrum of a sinusoid quantized
and sampled in DT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.11 In-band SNDR (solid) and total SNDR (dashed) versus frequency for several reso-
lution settings and a full-scale sinusoidal input. . . . . . . . . . . . . . . . . . . . 16
1.12 Quantization error for a sinusoidal input for several input amplitudes within a range
of a single quantization step. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.13 Variation in in-band SNDR for a small variation in amplitude for a six-bit quantizer
and several input frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.14 Quantization error spectrum of an eight-bit quantizer for (a) a sinusoidal input with
an input frequency of fIN, and (b) a bandlimited Gaussian signal with a bandwidth
fBW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.15 Power-per-frequency function S( f ). . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.16 Aliased quantization error of a eight-bit FQT ADC for a full-scale sinusoidal input
of fixed frequency (and fixed fSAW,max) versus sampling frequency, normalized to
the maximum sawtooth frequency for several values of bandwidth. . . . . . . . . . 29
1.17 Aliased quantization error of a FQT ADC for a full-scale sinusoidal input versus
sampling frequency, normalized to the input frequency for several resolution val-
ues, with fBWfIN = 200. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
vi
1.18 (a) In-band ( fBW/ fIN = 50) and (b) total aliased error for sampling frequencies
above and below fSAW,max. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1 Generalized block diagram of the proposed variable-resolution CT ADC. . . . . . 33
2.2 Transfer characteristics of 2- and 3- bit variable-resolution (a) mid-tread, and (b)
mid-rise quantizers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3 (a) Variable-resolution transfer characteristics of a skip-one-step quantizer, (b) ex-
ample output, and (c) quantization error. . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Proposed variable-resolution quantization, achieved by skipping two steps. (a) VR
transfer characteristic, (b) example output, and (c) resulting symmetric quantiza-
tion error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Example signals of a variable resolution system with three resolution settings: in-
put signal, magnitude of the input slope, quantization step, normalized to the mini-
mum step, quantization error, normalized to the minimum ∆, and the reconstructed
quantized output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.6 VR quantization error for a sinusoidal input, partitioned into segments correspond-
ing to high and lower resolutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.7 Spectra of the quantization error in dB, referred to the signal (not shown), for
a 1kHz full-scale sinusoid quantized with (a) a fixed 6-bit quantizer, and (b) a
variable-resolution quantizer, with a maximum resolution of 6 bits. . . . . . . . . . 40
2.8 Variable-resolution system block diagram using a fixed-resolution (FR) quantizer. . 42
2.9 (a) Slope detector block diagram with (b) example waveforms. . . . . . . . . . . . 44
vii
2.10 Comparison of the quantization error, tokens, and delay cells for a fixed-resolution
(left) and a variable-resolution (right) systems. . . . . . . . . . . . . . . . . . . . . 46
2.11 SNDR for a small range of amplitudes of a 400 Hz sinusoid for a maximum 8-bit
variabe-resolution, an 8-bit fixed-resolution quantizer and a 7-bit fixed resolution
quantizer. SNDR is calculated in a 3.6 kHz voice bandwidth, with quantizer slope
threshold selected to keep the sawtooth frequency a decade away from the band of
interest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.12 SNDR in a 3.6 kHz bandwidth vs. amplitude vs. frequency for (a) a fixed-
resolution 8-bit ADC, and (b) variable-resolution ADC of maximum 8-bit resolution. 52
2.13 Number of tokens produced per second for a (a) fixed-resolution, and (b) variable-
resolution quantizers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.14 Spectrum of a quantized two-tone spectrum for (a-b) a fixed 6-bit quantizer, and
(c-d) variable-bit quantizer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.15 Example waveforms of a VR quantizer: (a) speech input, (b) magnitude of the
input slope, (c) quantization step, normalized to the minimum step, and (d) recon-
structed quantized output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.16 Speech signals, low-pass filtered by a FR CT DSP and a VR CT DSP. . . . . . . . 54
3.1 Generalized body diagram of a CT ADC. Sampling criterion realized by using an
error conditioner, f (), and threshold δ that can be adjusted according to properties
of the input x(t), a CT quantizer, a digital signal encoder and a reconstruction block. 57
3.2 Input signal, sampler output and quantization error of 3-bit (a) CT ADC or CT
Quantizer, (b) level-crossing or send-on-delta sampler and (c) hysteresis quantizer. 59
viii
3.3 (a) Block diagram of an integral criterion sampler, (b) example input and sampled
output, (c) the error waveform and (d) integral of the absolute error for δ = 0.005
and an arbitrary input frequency of 1 Hz. . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Spectrum of a sinusoid sampled using the integral of error criterion with δ =
0.02 fIN and NS=10. Subharmonics exist, but are well below the level of the har-
monics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5 Simulated in-band and total SNDR and calculated total SNDR and in-band SFDR
for an integral criterion sampler with δ = 0.01 vs. frequency for no input quanti-
zation and an arbitrary bandwidth of 1 Hz. . . . . . . . . . . . . . . . . . . . . . . 70
3.6 In-band and total SNDR for an integral criterion sampler with δ = 10−5 vs. fre-
quency for no input quantization and finite quantization resolution ranging from
11 to 9 bits and an arbitrary bandwidth of 1 Hz. . . . . . . . . . . . . . . . . . . . 71
3.7 (a) Block diagram of a CT Quantizer with linear prediction, (b) example input, and
sampled output and (c) the error waveform N = 5 and NS = 14, normalized to ∆. . . 73
3.8 In-band (solid) and total (dashed) SNDR for a quantizer with linear prediction with
N=10, 8 and 6, NS N using simulation results. . . . . . . . . . . . . . . . . . . 77
3.9 Effective sampling rate and in-band SNDR of an 8-bit quantizer, an integral of
error criterion ADC, and a maximum 8-bit variable-resolution quantizer with 3
resolution settings and a quantizer, all using ZOH reconstruction. The results of an
ADC with linear prediction are also shown but using FOH reconstruction. . . . . . 80
ix
3.10 In-band SNDR of an 8-bit quantizer, an integral of error criterion ADC, and a
maximum 8-bit variable-resolution quantizer and with 3 resolution settings and a
quantizer with linear prediction. The outputs of all ADCs are reconstructed with
FOH. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.1 Illustration of the adaptability of a processor’s response in different signal and
environment scenarios in the context of a receiver: (a) strong signal, (b) signal and
single strong blocker, (c) signal corrupted by multiple undesirable components. . . 86
4.2 Level-crossing sampling of a 3-GHz signal, with binary and ∆-encoded signal rep-
resentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3 Per-level representation of a quantized signal. . . . . . . . . . . . . . . . . . . . . 90
4.4 Propagation of a 55 ps pulse through a chain of FO2 inverters. The output signals
of even-numbered inverters are shown under the corresponding number. . . . . . . 92
4.5 (a) Example signal of per-edge representation and, (b) per-edge encoder. . . . . . . 95
4.6 Block diagram of a gigahertz CT per-edge-encoded DSP system. . . . . . . . . . . 97
4.7 Example per-level and per-edge signals for a sinusoidal input. . . . . . . . . . . . 99
4.8 Effect of delay mismatch on per-edge signals and an ideally-reconstructed per-
level signal for τ = 0.15TIN. The initial value of per-edge signals is of the same
polarity for (a), and of the opposite polarity for (b). The delayed input is shown to
align the level-crossing times with the transitions of the ideal signals (dashed). . . . 99
x
4.9 Reconstructed signal (a–b), quantization error (c–d), and the output spectra (e–f) of
a per-edge encoded quantized sinusoid with τ= 0.1TIN (distorted) and with τ= 0
(ideal) for initial values of the per-edge signals (a, c, e) of the same polarity and
(b, d, f) of opposite polarities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.10 (a) Input sinusoid x, reconstructed output xˆq, and distorted-sine signal xD, (b) quan-
tization error eˆq, (c) low-pass filtered quantization error, eˆq,lowpass and the error, eD,
of the distorted sine, xD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.11 Half-harmonic distortion power versus percent delay discrepancy using simulation
(dashed) and calculation (solid) results. . . . . . . . . . . . . . . . . . . . . . . . 106
4.12 Output power at the input frequency versus percent delay discrepancy using simu-
lation (black, dashed), eq. 4.10 (blue), and eq. 4.11 (red). . . . . . . . . . . . . . . 108
4.13 Half-harmonic distortion ratio HD 1
2
versus percent delay discrepancy obtained us-
ing simulation (dashed) and calculation (solid). . . . . . . . . . . . . . . . . . . . 109
4.14 Output power of a notch filter (a) at the fundamental frequency and (b) at the half-
harmonic component. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.15 Output power of lowpass, bandpass and bandstop filters (a) at the fundamental
frequency and (b) at the half-harmonic frequency. . . . . . . . . . . . . . . . . . . 114
4.16 Power of the fundamental component and several distortion components at the
output of a three-bit, single-tap filter with the standard deviation of delay mismatch
of (a) 5 ps (b) 10 ps, and (c) 20 ps for a range of input frequencies. . . . . . . . . . 117
4.17 Power of the fundamental component and several distortion components at the
output of a three-bit, two-tap notch filter with the standard deviation of delay mis-
match of (a) 5 ps (b) 10 ps, and (c) 20 ps for a range of input frequencies. . . . . . 118
xi
4.18 Power of the fundamental component and several distortion components at the
output of a three-bit, seven-tap bandpass filter with the standard deviation of delay
mismatch of (a) 5 ps (b) 10 ps, and (c) 20 ps for a range of input frequencies. . . . 119
4.19 Signal power and in-band error power at the output of a three-bit, 7-tap bandpass
filter for a standard deviation of delay jitter of (a) 0.5 ps (b) 2 ps, and (c) 8ps. . . . 120
5.1 (a) Frequency response a CT bandpass filter realized with the coefficient of a DT
highpass filter and (b) frequency response of the DT highpass filter. . . . . . . . . 121
5.2 Block diagram of a per-edge-encoded CT filter. . . . . . . . . . . . . . . . . . . . 125
5.3 (a) Current-resistor-based (not used) and (b) charge-based analog adders. . . . . . 126
5.4 Block diagram of a three-bit gigahertz-range per-edge encoded digital processor,
composed of a joint FIR filter-DAC block. . . . . . . . . . . . . . . . . . . . . . . 130
5.5 (a) Ideal mid-rise quantizer transfer characteristic and (b) a mid-rise characteristic
with an LSB/2 DC offset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.6 Example signals of a CT DSP in Fig. 5.4 at the mth level and the kth tap. . . . . . . 132
5.7 Possible digital delay-cell schemes based on (a) CML gate, (b) inverter with vari-
able capacitive load, (c) current-starved inverter, (d) delay cell with fighting transistor.137
5.8 Thyristor-like-based digital delay cell for a rising edge input (reset on the falling
edge, reset circuit not shown) and associated signals. . . . . . . . . . . . . . . . . 141
5.9 The schematic diagram of the energy-efficient delay cell with positive feedback. . . 143
5.10 Example signals of the delay cell. . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.11 (a) Half-circuit of the current-starved inverter and the bias-generating circuit and
(b) a model of the delay mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . 150
xii
5.12 Transient signals of the output voltage of the current-starved inverter with positive
feedback for a range of bias current values, IN . . . . . . . . . . . . . . . . . . . . . 152
5.13 Six-bit bias-generating circuit for nMOS and pMOS control-current sources. . . . . 152
5.14 Delay range versus control word value for several process corners and temperatures. 153
5.15 Percent error in delays for rising- and falling-edge inputs without calibration. . . . 154
5.16 Energy consumption per delay operation versus delay. . . . . . . . . . . . . . . . . 154
5.17 The average delay and the 1-σ boundaries of delay variation due to random local
mismatch for (a) rising-edge and (b) falling-edge inputs. . . . . . . . . . . . . . . 156
5.18 Dominant delay jitter mechanism composed of a noise current source and a dis-
charged capacitor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.19 Projected paths to the threshold voltage from several values of VERR; standard de-
viation in path duration is negligible compared to the path duration. . . . . . . . . 162
5.20 Delay mechanism model with a finite output impedance and example signals. . . . 164
5.21 (a) Schematic diagram and (b) small-signal noise model for the delay cell’s half
circuit for a rising-edge input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.22 A comparison of calculated and simulated values of the delay and the delay jitter
standard deviation for several bias-current values. . . . . . . . . . . . . . . . . . . 169
5.23 Simulated delay jitter vs. cell delay. . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.24 Block diagram of a bidirectional charge pump of the filter in Fig. 5.4. . . . . . . . 171
5.25 Output power of the fundamental component of a 3-bit quantized signal recon-
structed as a single filter tap by using a the charge pump as a DAC for several
values of the current pulse duration. . . . . . . . . . . . . . . . . . . . . . . . . . 173
5.26 Schematic diagram of the XOR-based pulse generator. . . . . . . . . . . . . . . . 174
xiii
5.27 Example signals of the XOR-based pulse generator. . . . . . . . . . . . . . . . . . 176
5.28 Schematic diagrams of a NOR-based and a NAND-based one shot circuit and ex-
ample signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.29 Schematic diagram of the proposed one-shot circuit and example signals. . . . . . 182
5.30 Schematic diagram of a pulse generator based on a single-edge-sensitive one-shot
and a redirecting mux. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.31 Schematic diagram of a pulse generator with a two pulse-width settings and exam-
ples signals illustrating the narrow and wide output pulse options. . . . . . . . . . 189
5.32 Binary-weighted negative current source. . . . . . . . . . . . . . . . . . . . . . . 191
5.33 Bi-directional current source and bias generating circuits for (b) the first chip and
(c) the second chip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
5.34 (a) Output current and (b) change in output voltage of the bi-directional charge
pump of chip 1 for several values of reference bias current. . . . . . . . . . . . . . 194
5.35 (a,c) Output current and (b,d) change in output voltage of the bi-directional charge
pump of chip 2 for several values of reference bias current. (a,b) correspond to
long pulses (SLOW logic-level high) and (c-d) correspond to short pulses (SLOW
logic-level low). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
5.36 Variable adder capacitor implementation. . . . . . . . . . . . . . . . . . . . . . . 199
5.37 (a) Output current of a mismatched charge pump and (b) the drifting output voltage
of the charge pump for a 50% positive and negative current mismatch. . . . . . . . 200
5.38 DC-control-block implementation with a differential sensing input. . . . . . . . . . 200
5.39 (a) DC-control-block implementation with (b) a turn-on timing control circuit. . . . 202
xiv
5.40 Output voltage of a charge pump with 50% current mismatch. Output voltage drift
corrected by the DC control circuit. . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.41 Simulated filter output voltage for single-tone inputs processed through (a) a 7-tap
bandpass filter and (b) a 3-tap notch filter (with the middle coefficient set to equal
zero). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
5.42 Delay-cell tuning scheme block diagram. . . . . . . . . . . . . . . . . . . . . . . 211
5.43 Delay-cell duty-cycle tuning-scheme block diagram. . . . . . . . . . . . . . . . . 215
5.44 Coefficient-mismatch tuning-scheme block diagram for the 1st tap (k = 1). . . . . . 218
6.1 Chip micrograph. (a) Overall view, (b) detail of the CT digital processor. . . . . . . 220
6.2 PC board and FPGA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
6.3 Measurement setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
6.4 (a) Transfer characteristic of the system configured as a single-tap ADC-DAC and
(b) harmonic distortion ratio for a 1-GHz input signal. . . . . . . . . . . . . . . . . 224
6.5 Transfer characteristic of the system configured as an ADC-DAC for several input
frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
6.6 (a) Measured feedthrough signal with the processor disabled for an input power of
−41 dBm. (b) Group delay of the feedthrough transfer function. . . . . . . . . . . 227
6.7 (a) Measured frequency responses of a 1-tap processor for positive and negative
coefficient signs, and the feedthrough signal. (b) Frequency response with the
feedthrough signal subtracted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
6.8 Normalized frequency response of a single-tap filter configured at different taps
for positive (red) and negative (blue) coefficient signs. . . . . . . . . . . . . . . . . 231
xv
6.9 Normalized frequency response of a single-tap filter configured at different taps
with feedthrough signals subtracted. . . . . . . . . . . . . . . . . . . . . . . . . . 232
6.10 Model of the feedthrough path and the single-tap processor path for signal recon-
struction at the kth tap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
6.11 Frequency responses for a single-tap processor with the signal reconstructed at
the 0th, 3rd and 5th taps, positive (red) and negative (blue) coefficients from (a)
measurement results (b) simulation results based on the feedthrough model. . . . . 233
6.12 Measured frequency response, with feedthrough subtracted, for a single-tap filter
for increasing input powers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
6.13 Measured feedthrough power for increasing input frequency. The CT processor is
disabled. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
6.14 Deduced input driver gain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
6.15 Frequency response of the system for a single-tap processor configuration for (a) a
positive coefficient, (b) a negative coefficient, and (c) with feedthrough subtracted.
Results shown for measured responses (solid) and responses derived from the de-
duced input driver response and the deduced processor and output buffer response
(dashed). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
6.16 Measured output power of components at fIN and 12 fIN for signal reconstruction at
the 0th tap, with coefficients at all other taps disabled but the delay cells at other
taps enable. The half-harmonic power is shown for the cases of the 6th-tap delay
cells enabled and disabled. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
6.17 A simplified diagram of the processor layout and example signals to illustrate the
generation of the strong half-harmonic distortion. . . . . . . . . . . . . . . . . . . 241
xvi
6.18 Tap-delay values obtained through simulation (dashed) and estimated from mea-
surements using a ring-oscillator-based tuning scheme (solid) for (a) the full tuning
range and (b) the intended delay range. . . . . . . . . . . . . . . . . . . . . . . . . 242
6.19 Simulated tap delays (dashed) and calculated average tap delays based on a mea-
sured notch frequency (solid) versus reference bias current for a control-word value
of 32. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
6.20 Measured frequency response of a notch filter and a gain-adjusted response of an
ideal notch filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
6.21 Measured frequency response of several notch filters (see text). . . . . . . . . . . . 245
6.22 Measured frequency response of a bandpass filter, and ideal gain-adjusted response. 247
6.23 Measured frequency response of a lowpass filter, and ideal gain-adjusted response. 247
6.24 Measured frequency response of an amplitude equalizer, and ideal gain-adjusted
response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
6.25 Measured frequency response of a bandstop filter, and ideal gain-adjusted response. 248
6.26 Measured frequency response of a bandstop filter, and ideal gain-adjusted response. 248
6.27 Measured frequency response of a lowpass filter, and ideal gain-adjusted response
(see text). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
6.28 Measured output spectrum for a 1 GHz full-scale input processed through a single-
tap filter. The frequency response of the filter is shown as a dashed line. . . . . . . 250
6.29 Measured output spectrum for a two-tone input processed through a single-tap
filter with each tone -6 dB relative to full-scale power. The frequency response of
the filter is shown as a dashed lines. . . . . . . . . . . . . . . . . . . . . . . . . . 251
xvii
6.30 Measured output spectra of a notch filter for (a) a single-tone full-scale input at 0.8
GHz , (b) a two-tone equal-amplitude input of 1.9 GHz and 3.2 GHz with each tone
-6 dB relative to full-scale power, and (c) a two-tone equal-amplitude input of 1
GHz and 2.8 GHz with each tone -6 dB relative to full-scale power. The frequency
response of the filter is shown as a dashed line. . . . . . . . . . . . . . . . . . . . 252
6.31 Measured output spectra for full-scale single-tone inputs (overlaid) for a processor
configured as an amplitude equalizer. The frequency response of the filter is shown
as a dashed line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
6.32 Measured output spectra for two-tone equal-amplitude inputs (with each tone -6
dB relative to full-scale power) at (a) 1 GHz and 3 GHz and (b) 0.8 GHz and 2
GHz, processed through a bandstop filter. The frequency response of the filter is
shown as a dashed line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
6.33 Measured output spectrum for a two-tone equal-amplitude inputs (with each tone
-6 dB relative to full-scale power) with in-band tones at 1.2 GHz and 1.3 GHz,
processed through a lowpass filter. The frequency response of the filter is shown
as a dashed line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
6.34 Measured output spectrum for a two-tone equal-amplitude inputs (with each tone
-6 dB relative to full-scale power) processed through a bandpass filter for (a) two
in-band tones at 2.9 GHz and 3 GHz, (b) one in-band tone at 3 GHz and one out-
of-band tone at 4 GHz, and (c) two out-of-band tones at 3.75 GHz and 4.5 GHz.
The frequency response of the filter is shown as a dashed line. . . . . . . . . . . . 257
6.35 Power dissipation of the CT digital processor versus input frequency for several
values of the input power, relative to the full-scale input power. . . . . . . . . . . 259
xviii
6.36 Power dissipation of the CT digital processor versus input frequency for a full-
scale input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
6.37 Filter’s energy consumption per input cycle. . . . . . . . . . . . . . . . . . . . . . 260
7.1 Example implementation of a first-order reconstructing CT DAC comprised of a
ZOH DAC and a linear extrapolator. . . . . . . . . . . . . . . . . . . . . . . . . . 266
xix
List of Tables
5.1 Delay-cell transistor widths. All transistors are minimum length of 65 µm. . . . . . 144
5.2 Transistor sizes of the XOR-based pulse generator. All transistors are of the mini-
mum length of 65 nm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.3 Pulse-generator propagations delay, output pulse width and input-polarity-dependent
discrepancy in the delay and in the pulse width for several process corners and tem-
peratures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.4 Transistor sizes of the proposed one-shot circuit. All transistors are of the mini-
mum length of 65 nm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
5.5 Propagations delay and output pulse width (average values and difference in values
for rising- and falling-edge inputs) for several process corners and temperatures. . 190
5.6 Transistor size of the bi-directional current source. . . . . . . . . . . . . . . . . . 193
6.1 Performance summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
6.2 Comparison of the presented work to CT DSPs in prior art and a state-of-the-art
DT DSP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
xx
Acknowledgments
I would like to thank Prof. Tsividis, my advisor, for his guidance in research, for invaluable
lessons in intuitive thinking and clear writing, and for his patience. I am grateful for his contagious
excitement about elegant circuits. He inspires me to want to learn another language, to play another
instrument, and to dive into another book. He has made an impression.
I thank the National Science Foundation for funding me throughout my research. I am grateful
to Dominique Morche and Pierre Vincent from Minatec/LETI for providing an interesting appli-
cation for this work, making the project possible, and for their guidance throughout the project. I
would like to thank Boris Kurchuk, my father, for his creative solutions to making my board work
and ridding my measurements of ripples. Thanks to Prof. Jelenkovic and Prof. Kalet for their
helpful comments.
I would like to thank my doctoral committee members–Prof. Toby Cumberbatch , Prof. Harish
Krishnaswamy, and Dr. Dominique Morche, Prof. Yannis Tsividis, and Prof. Charles Zukowski–
for their comments and the time they dedicated to reading my thesis.
I would like to thank my colleagues from LETI, particularly Erkan Isa, Mykhailo Zarudniev,
and Frederic Hameau, for their support, hospitality, and morning bises. They have made my time in
France unforgettable. I am grateful to my CISL labmates, especially Karthik Tripurari, Anuranjan
Jha, and Colin Weltin-Wu, for their help, their friendship, and their happy hour initiatives. I thank
them for making the lab an amusing place to be.
I would like to thank Prof. Hamid Ahmad for making me fall in love with circuits.
I am grateful to my Thursday night gang–Tommy Pagano, Devin Plantamura, Val Spies, Rob
Nadramia–for being my dear friends and excellent companions in dorky discussions. I would like
xxi
to thank Jen Drieves for always cheering me up, for sharing some of my neuroses (and making
them seem normal), and for always trying to help. I thank Jen for our walks, when I feel most at
peace, for our excitement about the silliest things, and for our dreams, our refusal to believe that
life is prosaic. I would like to thank Jason Ira Neufeld for his affection and warmth, thermal and
otherwise. His constant support (unless baseball scores are on) has helped to break my overly-self-
critical infinite loops and his humor and wit have made life more amusing. I am grateful for his
serenades, his puppy sniffs, and his excellent sous-chef abilities.
Words (particularly in English) seem not enough to express my gratitude to my family. I thank
them for their love, for their support, and for always being one. I thank Alexandre Dumas for
his words, which he must have written just for us: “one for all, all for one.” I thank my brother,
Aleksey, for always being there for me, for standing up for me, and for loving me (despite dropping
me in infancy and withholding his toys). I thank him for being the one I always look up to. I would
like to thank my sister-in-law, Tatiana, for her constant encouragement and for her cheer. I am
grateful to my amazing niece, Natalya, for her smile. Her existence has made gloomy days seem
not so cloudy. I could not be without my mom’s love, her advice (even though I may not follow it
right away) and her encouragement. She has helped to sort the puzzles in my head and to believe
everything will somehow turn out right. I am grateful for her cheer and for her beauty (in so many
ways). I thank my dad for loving me so much, for his wisdom and his support (and a push when it
is needed). I thank him for arguing with me (a task few can withstand) and for understanding me. I
thank him for setting an ideal. I dedicate my thesis to my parents, Galina and Boris Kurchuk, who




Digital signal processors (DSPs1) are an integral part of systems for a variety of applications
because they offer the benefits of wide programmability and noise immunity. The analog-to-digital
converters (ADCs) preceding a DSP approximate an input signal by quantizing its amplitude to
a finite range of discrete values. Conventional discrete-time (DT) ADCs approximate the input
further by capturing changes in the input and updating the signal representation only at discrete
predetermined times, thereby causing signal quantization also in the time domain. This work
deals with digital signal processing systems that operate entirely in continuous time [1]. These
processors still use digital representation of the signal amplitude, but they do not quantize the
time. Continuous-time DSPs constantly track the input signal and continuously update its digital
representation without waiting for discrete update times, as is the case with DT DSP, where the
update times are dictated by a clock. As a result, CT digital representation is, qualitatively, more
accurate than DT digital representation for the same amplitude precision.
In discrete-time processors, the update times, referred to as sampling times, are set by a clock
1“DSP” in this thesis will be used as an acronym for both “digital signal processing” and “digital signal processor”.
1
Figure 1.1: Clockless level-crossing sampling.
signal whose frequency is independent of the instantaneous properties of the input. The mini-
mum time resolution that is needed for adequate signal representation is chosen to accommodate
a worst-case signal, for example a signal of the maximum expected input frequency. For sparse or
sporadically changing signals, which can have segments of high-activity and intervals of silence,
DT systems with a fixed sampling rate, which is set to keep up with the fastest input signal, is
inefficient. A DT system generates samples even when there is little change in the input and leads
to an unnecessarily high power dissipation. Several approaches have been proposed to improve
signal-tracking properties of DT DSPs [2], for example, by reducing the sampling and operating
frequency depending on the input frequency content [3, 4]. However, realizing a variable sam-
pling rate can significantly increase the complexity of the DSP and leads to hardware and power
overhead.
An effective nonuniform sampling approach that has found use in control, sensor applications,
to name a few, is based on level-crossing sampling (LCS), where a sample is produced whenever
an input crosses one of several regularly spaced quantization levels [5–7]. This is equivalent to
updating the signal representation whenever the input has diverged from the previous sample by
an error whose magnitude is equal to the spacing between quantization levels ∆; this is also called
2
asynchronous delta modulation [8–10]. DT level-crossing sampling can be realized by introducing
fine time quantization [11, 12]. If samples are generated when the error threshold is reached with-
out waiting for a clock edge, as is shown by red dots in Fig. 1.1, clockless level-crossing sampling
is equivalent to ideal clockless quantization [1,13] and has been used for digital signal representa-
tion in CT DSP for voiceband applications in prior art [14, 15]. The rate at which CT samples are
generated depends on how quickly the signal traverses consecutive quantization levels and is there-
fore dependent on the activity level of the input. A CT DSP dissipates dynamic power only when
there are samples to process; therefore, its power dissipation automatically adapts to the signal. In
applications with sporadically changing signals, as in hearing aids and sensors, activity-adaptable
sampling, inherent to CT DSPs, leads to a reduction in the average number of generated samples
in comparison to the case of regular sampling because no samples are produced during input inter-
vals of silence. Many applications can benefit from the activity-dependent sampling rate and the
resulting activity-dependent power dissipation, such as implantable biomedical devices, DC-DC
converters [16], sensors and control systems. The in-band error of a CT DSP system is lower than
that of a DT system of the same resolution because there is no in-band aliasing of the quantization
error. The elimination of a clock from the system offers the additional benefits of a reduction in
substrate noise and electromagnetic interference emissions [17].
This work focuses on signal encoding of level-crossing-sampled signals for CT digital proces-
sors and CT DSP implementations for processing gigahertz-range signals. A CT sample will also
be referred to as a token within this work. The rate at which CT samples are generated will also be
referred to as a token rate or an effective sampling rate.
3
1.1 Comparison of CT ADC/DSP/DAC and DT ADC/DSP/DAC
Figure 1.2: Comparison of a continuous-time and discrete-time ADC: (a) block diagram, (b) input
signal, quantization thresholds (dashed), and reconstructed quantized input, and (c) example digital
representation.
A CT ADC/DSP/DAC system has the same structure as a conventional DT ADC/DSP/DAC
and is comprised on an analog-to-digital converter (ADC), a digital processor and a digital-to-
analog converter (DAC), all of which operate without a sampling or a synchronizing clock. A
CT ADC is juxtaposed to a discrete-time ADC in Fig. 1.2. A continuous time ADC is equivalent
to a clockless quantizer with a digital output, whereas a DT ADC is equivalent to a digitizing
quantizer that is followed by a sample-and-hold2. The order of the quantizer and sample-and-
hold are interchangeable because both configurations result in identical output signals. In the DT
2This representation is conceptual, and is appropriate for this discussion. In an actual DT ADC, the sampler
precedes the quantizer.
4
case, sampling instances, shown as dots in the right portion of Fig. 1.2b, occur at regularly spaced
intervals which do not typically coincide with level crossing times. Therefore, for fast portions
of the input not all level crossings are captured, and for slow input portions multiple samples are
generated with a repeated value. In the CT case, in contrast, tokens are generated only at level
crossing, as seen in the left portion of Fig. 1.2b. During fast-changing input portions, the CT ADC
generates more samples than the DT ADC, while during slow-changing intervals, the CT ADC is
more efficient. The CT digital output can be represented as a CT binary word, with each toggle
of the lowest significant bit (LSB) indicating a quantization level has been crossed (Fig. 1.2b).
Another example is a CT delta-encoded representation (Fig. 1.2c), where one bit, CH indicates that
a quantization level has been crossed and a second bit, DIR, indicates the direction of the change.
The timing information is retained in the transitions of the digital signals, for example, the toggles
of the LSB bit or the change signal CH. The binary-encoded signals contain redundant information
because consecutive samples only change by a quantization step. Delta-encoded representation
is preferred in low- to moderate- speed applications such as sensors, controls, and voice-signal
processing [11,12,14,15] because it is more efficient. (Other digital signal representations are also
possible; an encoding suitable for CT digital signals in the gigahertz frequency range is presented
in Ch. 4.)
Since a delta-encoded representation is often used for asynchronous sampling, a delta-encoded
CT ADC implementation is explained and is illustrated in Fig. 1.3 [7,9–11,14,15]. The input signal
is always tracked by maintaining it between reference signals VP and VN , which are offset from the
reconstructed output by ±12∆. Latchless comparators are used to indicate when the input crosses
one of the two reference signals. Digital control logic is used to generate the delta-encoded digital
ADC output. The logic is also used to update an internal CT DAC that provides the reconstructed
5
Figure 1.3: CT Delta-encoded ADC block diagram and example signals.
analog signal for generation of VP and VN . The reconstructed output of a CT ADC has a value that
is offset from the actual sampled value by±12∆, depending on the direction of the input, in order to
maintain the quantization error centered about zero. Since level-crossing times are known exactly,
without quantizing the time, there is effectively no sampling in time and, therefore, no aliasing.
Alternatively, a delta-encoded digital output can be realized by following the output of any CT
ADC with an LSB-change detector.
Continuous-time digital systems are similar in structure and flow to classical discrete-time
DSPs, as shown in Fig. 1.4a, with the major exception that operations are preformed in continuous
time, according to the timing of incoming tokens, and without any clock. Instead of registers to
implement tap delays, continuous time digital delay blocks are used, as will be explained in the
following section. The frequency responses of the two systems are identical. Consider the case
that both the CT and DT DSPs process signals generated by a DT ADC. If the tap delay, TD, of
the CT DSP is set equal to the inverse of the clock frequency of a DT DSP, 1fS , the output of both
DSPs is the same (save for the different quantization error) as are the frequency responses.
Conventional digital filter design tools, for example Matlab’s® Filter Design Toolbox, can be
used to design a CT digital filter by using the same coefficients but setting the tap delay equal to
6
Figure 1.4: Comparison of a CT and a DT DSP: (a) block diagram, (b) frequency response, (c)




. Unlike the case of the DT processor, input frequencies processed by a CT DSP are not limited
to the Nyquist bandwidth of 12 fS since there is no clock. When a CT digital signal is processed by
the CT DSP, the frequency response is unchanged compared to that of the DT DSP, except there
is no aliasing since there is no sampling in time; this is illustrated at the bottom of Fig. 1.4 for
TD = 1fS . Both CT and DT systems are supplied with the same analog input signal, which contains
components at frequencies above (shown as rectangles in Fig. 1.4) and below (shown as triangles)
the Nyquist bandwidth of 12 fS. In the DT system, the output signal is distorted due to in-band
aliasing of the component outside the Nyquist bandwidth, while the signals at the output of the CT
systems remains undistorted.
1.2 Timing granularity of CT digital processing
Figure 1.5: Granularity time between consecutive quantization-level crossings.
For the discussions in this work it is assumed that the input signal is bandlimited from fMIN
to fMAX and its amplitude range is normalized to be within ±AMAX, unless otherwise stated. The
quantization step of an N-bit system is ∆ = 2AMAX2N . To correctly represent the input signal, the
timing between quantization-level crossings must be accurately preserved. The ADC, DSP and
DAC blocks must have processing times less than the minimum token spacing in order to avoid
8
Figure 1.6: CT tap delay comprised of granular delay cells.
missing a token or distorting the timing. This minimum granular time between tokens, TGRAN , is
the time it takes the fastest input to cross a quantization level, as shown in Fig. 1.5. For a sinusoid





where 2piAMAX fMAX is the maximum input slope. The granularity time gets smaller and more
stringent as resolution or maximum input frequency is increased. Since the tap delay, TD, inside
the DSP can be longer than the time between tokens, the delay block must be able to process several
tokens simultaneously. The CT tap delay must then be comprised of TDTGRAN granular CT delay cells,
each of which has a delay of TGRAN in order to keep track of all the tokens, as shown in Fig. 1.6 [15].
For the example of an eight-bit system, for signals in a 20 kHz bandwidth and a typical tap delay
of 25 ns, the granularity time is 62 ns, which requires the tap delay to be comprised of over 400
granular cells. An integrated multi-tap CT DSP for such a system is dominated by the chip area
and power dissipation of these delay cells [15].
9
Figure 1.7: (a) Quantizer, and (b) equivalent quantizer comprised of a CT ADC and an ideal DAC.
Figure 1.8: Example waveforms of a three-bit CT ADC. Time-domain waveforms, normalized to
the input period, of (a) the input sinusoid, (b) the quantization error, (c) the quantizer output, and
their corresponding spectra, normalized to the input frequency, in (d), (e), and (f), respectively.
10
1.3 Properties of signals quantized in continuous-time
A continuous-time ADC, followed by an ideal DAC (Fig.1.7b), is a clockless quantizer (Fig.1.7a)
which updates its output, xq(t), each time the input, x(t), crosses a quantization level. As discussed
earlier in this chapter, this conversion is equivalent to level-crossing sampling. To illustrate the
principles involved, consider the quantizer signals shown in Fig. 1.8 for a sinusoidal input. The
effect of quantization is merely that of a hard nonlinearity; for a periodic input, the quantizer output
is also periodic, and can be expressed as a Fourier series with components at the input frequency
fIN and its harmonics, where the latter are caused by the quantization error e(t) = x(t)−xq(t), illus-
trated in Fig. 1.8b. The spectrum of a quantized sinusoid thus contains only the signal and its har-
monics, as shown in Fig. 1.8f. The transfer characteristic of an ideal quantizer is odd-symmetric,
which causes quantized signals to have odd symmetry, and therefore only odd-numbered harmon-
ics are present. Since there is no sampling in time, it has been shown in [1, 13] that there is no
aliased quantization noise floor. Related properties hold for the case of signals that are more com-
plicated than sinusoids, except that the Fourier series is replaced by a Fourier transform. The total
quantization error of a CT ADC is the same as the in-band error of a conventional Nyquist-rate
ADC, as will be explained in the following section, but the in-band error of the CT case has been
confirmed experimentally to be smaller in [15].
The quantization error waveform, shown in Fig. 1.9a, can be partitioned into a bell-shaped
waveform and a sawtooth-like waveform [18]. The former occurs due to slowly varying portions of
the input, which, for the case of a sinusoidal input, are at its peaks. The sawtooth-like error occurs
during fast portions of the input when quantization levels are traversed quickly. The two error
segments contribute differently to the spectrum of a quantized sinusoids, as shown in Fig. 1.9b. The
11
Figure 1.9: (a) Quantization error for a full-scale input sinusoid quantized with three-bit resolu-
tion, and (b) spectrum of quantization error for a sinusoidal input quantized with six-bit resolution.
slowly varying bell-shaped error contributes primarily to the low-frequency distortion, whereas
the fast-varying sawtooth-like error contributes high-frequency harmonics, which lie outside the
baseband. Consider the sawtooth-like portion of the waveform. The minimum tooth duration is
the time it takes a sinusoidal input of the form x(t) = AINsin(2pifINt) to cross an interval equal








When AIN = AMAX and fIN = fMAX, this can be seen to be equal to the granularity time defined












The maximum sawtooth frequency, fSAW,max, is defined as the inverse of the minimum sawtooth
duration. The frequency of the largest distortion spur in Fig. 1.9b can be approximated as the
inverse of the minimum tooth duration, (an intuitive result, but one which requires a lengthy
proof) [18]. Other sawtooth frequencies, below fSAW,max, correspond to the varying-width saw-
tooth segments of non-minimum duration. Frequencies above fSAW,max are due to the harmonics
of the maximum and lower sawtooth frequencies. For any bandlimited input signal, the maximum











where N is the number of bits. The majority of the quantization error power occurs in the frequency
range up to fSAW,max. As the resolution is increased, that spurt shifts away from the baseband;
from the results in [18], the amplitude of the spur decreases by 9 dB for each additional bit of
resolution. [18] intuitively explains that as the resolution is increased by a single bit, the amplitude
of the quantization error is halved, which accounts for a 6 dB decrease in the error power. The ratio
of the error power up to the sawtooth frequency to the total error power is the same regardless of
the resolution. Since the sawtooth frequency is doubled because the quantization step is halved, the
error power must be spread over a doubled frequency range. The level of the harmonics, therefore,
must decrease additionally by 3 dB to account for the doubled fSAW,max.
In-band error is due primarily to the bell-shaped error, which contributes in-band low-frequency
distortion. For moderate- and high-resolution systems, the sawtooth error power lies mostly out-
side the baseband, and thus contributes insignificantly to the in-band SNDR. In-band SNDR is
defined as the ratio of the signal power to the power of quantization error and noise that falls in
13
the band of interest. An ideal quantizer is noiseless; therefore, the in-band error power is only
due to quantization distortion. One can consider taking advantage of the spectral properties of the
quantization error to reduce power dissipation by decreasing the average resolution of the system,
but without increasing the in-band error. This technique is proposed in Ch. 2.
1.3.1 Variations in in-band error with changes in input frequency and am-
plitude
In an ADC with a fixed sampling rate, the in-band SNDR is independent of the input frequency
and is equal to the total quantization error of a clockless ADC of the same resolution. This has
been shown in [13] by using the following reasoning. A DT ADC can be represented as a CT
ADC followed by a sample-and-hold block. The order of operations is interchangeable because
the output signal is the same regardless of whether it is first quantized or sampled in time. The
spectrum of a quantized bandlimited signal contains the spectrum of the original input signal plus
the quantization error spectrum, which is harmonically related to the input. For a sinusoidal input,
the output contains tones at the input frequency and its harmonics, as shown in Fig. 1.10a. When
a quantized signal is sampled in time, the spectrum of the quantized input is convolved with a pe-
riodic impulse train, which contains impulses at multiples of the sampling frequency, as shown in
Fig. 1.10b. All of the higher-frequency quantization error harmonics are aliased in-band by mixing
with the replicas of the sampling frequency, (Fig. 1.10c), as was first suggested by Bennett [19].
When the ratio of the sampling frequency to the input frequency is irrational, aliased harmonics
form a quantization error floor, which has been likened to a noise floor. Since all of the quantiza-
tion error of a CT-quantized signal falls into the frequency band up to fS2 , the in-band error power
14
Figure 1.10: (a) Spectrum of a quantized sinusoid, (b) frequency representation of an impulse
train corresponding to ideal time sampling, (c) spectrum of a sinusoid quantized and sampled in
DT.
of a DT-quantized signal is equal to the total power of a CT-quantized signal (assuming no aliased
components fold on top of other components) and is independent of the input frequency as long as
the latter is confined to fS2 . If the sampling frequency is chosen to be an integer multiple of the input
frequency in simulations, the high-frequency harmonics will alias on top of lower frequency har-
monics instead of forming a quantization error floor, and the output spectrum will contain discrete
harmonics.
For a CT ADC, the in-band error power depends on the number of harmonics that fall in band.
Consider a CT quantizer followed by an ideal brickwall lowpass filter with a cut-off frequency at
15
Figure 1.11: In-band SNDR (solid) and total SNDR (dashed) versus frequency for several resolu-
tion settings and a full-scale sinusoidal input.
fBW and a sinusoidal input signal of frequency fIN. If fBW < 3 fIN there are no in-band harmonics
(assuming the quantizer is odd-symmetric and thus there are no even-order harmonics), and the
in-band error power is zero. In a real implementation, the minimum error power is limited by
circuit noise. As the input frequency is decreased, causing more harmonics to fall in-band, the
error power increases and approaches the total mean squared error power, which is equal to the
quantization error power of a Nyquist-rate ADC [19], given by the following equation relative to
the power of a full-scale signal:
e2 =−6.02N−1.76 (dBFS) (1.5)
where dBFS stands for “dB referenced to the power of a full-scale signal”. This is illustrated
in Fig. 1.11, which was obtained with Matlab® simulations. The sharp transitions in the in-
band SNDR for high input frequencies are due to the fact that when few harmonics are in-band,
when a single harmonic is pushed out of band as fIN is increased, it results in a non-negligible
16
decrease in error power. As the input frequency is decreased relative to the bandwidth, the SNDR
approaches the SNR of a Nyquist rate ADC with the same resolution, shown as a dashed level.
For a sufficiently low input frequency, the sawtooth frequency (see Fig. 1.9b) falls in-band. The
frequencies for which fSAW,max = fBW are indicated by arrows in the figure. Since the majority of
the error power is contained in the frequency range up to fSAW,max, harmonics beyond fSAW,max
contain relatively little power, and, as a result, for low input frequencies the in-band SNDR is near
its minimum value and varies less with frequency.
Low-frequency distortion of a quantized signal is sensitive to small variations in amplitude
because the shape of the bell-shaped quantization error (Fig. 1.9) changes significantly if the input
amplitude is varied within the range of a quantization step, as shown in Fig. 1.12. Due to this
amplitude sensitivity, the in-band SNDR is also sensitive to small variations in amplitude when
only a few low-frequency harmonics are in-band. Fig. 1.13 shows the variation of in-band SNDR
for near-full-scale sinusoidal inputs over the input range corresponding to 2∆ for a six-bit quantizer.
The range of SNDR variation can be over 40 dB. The peaks in the in-band SNDR correspond to
cases where the bell-shaped error contains very little power in the third harmonic and only a couple
of harmonics are in-band. For the case where many harmonics fall in-band, the variation in SNDR
is within a few of dB, as seen in Fig. 1.13 for the case of fBW/ fIN = 39.
17
Figure 1.12: Quantization error for a sinusoidal input for several input amplitudes within a range
of a single quantization step.
Figure 1.13: Variation in in-band SNDR for a small variation in amplitude for a six-bit quantizer
and several input frequencies.
18
1.4 Comparison of signals sampled in continuous time, finely
quantized time and discrete time
Level-crossing sampling in continuous time involves generating samples at the exact level-crossing
times without quantizing these times. The spacing of the tokens is an accurate representation of the
intervals between level crossings. Continuous-time operation can be approximated by using fine
time steps. In this case every level-crossing is tracked but the level-crossing times are not exactly
known because they are quantized with a fine timestep, which must be smaller than the granularity
time, TGRAN (eq. 1.1) [11, 12]. Using a timestep that is smaller than the granularity time, which
corresponds to the minimum sawtooth duration in eq. 1.2, is equivalent to using a high-speed
sampling clock whose frequency is above fSAW,max of a maximum amplitude, maximum frequency
input. This approach will be referred to as finely quantized time (FQT). For the case of a sinusoidal
input at fIN, the following list of sampling techniques is given in order of increasing error: (1)
continuous-time sampling (CT), (2) finely quantized-time sampling with fS > fSAW,max (FQT), (3)
oversampling (OS) with fS < fSAW,max (Nyquist-rate sampling is a special case of oversampling,





In oversampled ADCs, if the sampling frequency is doubled, the level of the quantization error,
which is modeled as a quantization noise floor, decreases by 3 dB because the total quantization
error is spread over twice the bandwidth. The in-band SNR therefore improves by 3 dB [19]. It is
tempting to call finely quantized-time sampling oversampling, but this is not appropriate. When
19
every level crossing is captured at the output of a FQT sampler, consecutive samples are highly
correlated to each other and to the input; therefore, it is not valid to model the quantization er-
ror as a noise floor, uncorrelated to the input, and the above reasoning for oversampling does not
hold. The 3 dB improvement for a doubled sampling frequency behavior of an oversampling ADC
suggests that as the sampling frequency approaches an infinite frequency the in-band error would
reduce to zero, which is not correct. In reality, as fS is increased beyond the maximum sawtooth
frequency, an oversampled ADC approaches the behavior of a FQT ADC. As the sampling fre-
quency approaches infinity, a finely quantized time ADC approaches CT operation. Therefore, the
in-band SNR approaches the in-band SNDR of a CT quantizer [17].
The FQT and oversampling ADCs can be represented as a CT quantizer followed by a sample-
and-hold block. The relationship between CT, FQT and oversampling can be explained by consid-
ering the spectrum of a CT quantized signal; the output spectrum of a time-sampling ADC, Xq,S( f )
is the result of the convolution operation of the spectrum of a CT quantizer output, Xq( f ) with a
periodic frequency impulse train




δ( f −n fS)
]
(1.7)
As a result, the in-band error power is the sum of the error power in a bandwidth of ± fBW around
each harmonic of the sampling frequency fS.
The quantization error spectra of an eight-bit CT quantizer are shown in Fig. 1.14 for the
case of a sinusoidal input with a frequency fIN and a bandlimited signal with a Gaussian distri-
bution and a bandwidth of fBW. These results have been obtained using Matlab® simulation.
In the case of a sinusoidal input, most of the error power is contained in the frequency range
20
Figure 1.14: Quantization error spectrum of an eight-bit quantizer for (a) a sinusoidal input with
an input frequency of fIN, and (b) a bandlimited Gaussian signal with a bandwidth fBW.
21
bounded by fSAW,max. In the case of a Gaussian input, the majority of the error power is observed
to be contained in the frequency range up to the average sawtooth frequency, which is defined
as fSAW,max = 2piARMS fIN/∆ = piARMS fBW/∆, where ARMS is the root mean square value of the
Gaussian signal and is equal to the standard deviation of the input’s distribution. For frequencies
below the maximum sawtooth and average sawtooth frequencies, the quantization error spectra are
nearly flat, as shown in (a) and (b), respectively. For frequencies that are several times larger than
the sawtooth frequencies, the level of the output spectrum harmonics appears to drop off with a
slope of −20 dB per decade. To explain the characteristics of a CT quantization error spectrum,
the case of the sinusoidal input is considered, but it can be extended to more complicated signals.
The high-frequency portion of the spectrum can be explained by approximating the fast-switching
error as a periodic sawtooth-like signal with a variable tooth duration. The high-frequency error
spectrum contains components at the maximum sawtooth frequency, spurs near but below that
frequency due to the variable tooth duration, and harmonics of fSAW,max and sawtooth spur fre-
quencies. Due to the variable tooth duration, the sawtooth-like signal has numerous fundamental
sawtooth frequencies with values up to fSAW,max. A periodic sawtooth signal can be expressed as a
Fourier series with components at even and odd multiples of the sawtooth frequencies. The Fourier
coefficients decrease by a factor of 1n as the harmonic number n is increased. The power of the har-
monics of each of the fundamental sawtooth frequencies decreases by 1n2 , which accounts for the
-20 dB per decade slope in the quantization error spectrum. While the harmonics of lower sawtooth
frequencies can be below fSAW,max, the spectrum begins to drop off at a -20 dB per decade slope
only after the frequency of the highest sawtooth component.
Although the quantization error is comprised of discrete tones, an average power-per-frequency
function, S( f ), which is a modified notion of a power spectral density, can be used to describe it.
22
For frequencies up to fSAW,max, the empirical S( f ) is flat and corresponds to the flat spectrum of the
quantization error, and for frequencies above fSAW,max, S( f ) drops off with a −20 dB per decade
slope, which corresponds to the −20 dB per decade roll-off of the quantization error spectrum.
Refer to Fig. 1.15. Instead of the power being measured in a standard 1 Hz bandwidth, as in the
case of a standard spectral density, the level of the power density in the flat portion is defined as the
quantization error power of error harmonics up to fSAW,max divided by the measurement bandwidth
of fSAW,max. The results from [18] (Sec. 1.3) indicate that the level of the harmonics decreases
by 9 dB with each additional bit of resolution, therefore the average power of each harmonic is
proportional to 2−3N . The error power up to fSAW,max is then proportional to the average power of
each harmonic and the approximate number of harmonics up to fSAW,max, i.e. it is proportional to
2−3N fSAW,maxfIN . After dividing this error power by the sawtooth frequency, the power-per-frequency
function in the flat portion of the spectrum is then:




where c1 is a proportionality constant that is independent of the input frequency, resolution or
bandwidth. The level of the power density decreases by 9 dB if the resolution is increased by a
single bit. If the input frequency is doubled, the density is halved, but the measurement bandwidth
of the error power up to the maximum sawtooth frequency also doubles; the error power in the band
up to fSAW,max, therefore, remains unchanged. The power-per-frequency function in the roll-off
portion of the error spectrum can be described by the following empirical expression:
















term is due to the fact that the power of the harmonics decreases at −20 dB
per decade. This expression matches eq. 1.8 at f = fSAW,max.
The following analysis is used to describe the aliased quantization error in FQT and oversam-
pling ADCs. The exact values of the error are not derived; rather, the variation of aliased error
power with sampling frequency and resolution is determined.
Aliased error power of an oversampled ADC
The sampling frequency of an oversampling ADC is typically restricted to be well below the max-
imum sawtooth frequency. This is, in fact, required for the approximation of quantization error as
noise that is uncorrelated to the input to be valid. The quantization noise approximation is used in
the literature to derive the well-known expression for the SNR of an overampling ADC [19] with
in-band error given by the following expression:
e2oversampling,dB =−6.02N−10log10(OSR)−1.76 (dBFS) (1.10)
where the oversampling ration, OSR, is defined in eq. 1.6.
24
The aliased in-band error of an oversampling ADC is equivalent to the error harmonics of a
CT ADC aliased by mixing with the sampling frequency and its harmonics. The majority of the
aliased error power that falls in-band is primarily aliased by the sampling frequency harmonics
that are in the flat portion of the CT quantization error spectrum of Fig. 1.14 where the error power
is high. The aliased power is given primarily by the sum of the CT quantization error power in
a bandwidth ± fBW centered around the harmonics of fS in the flat portion of the error spectrum,






∫ n fS+ fBW
n fS− fBW




















fSAW,max = pi fIN2N for a full-scale sinusoidal input (see eq. 1.4). After substituting this expression
into the above equation, using the definition of OSR from eq. 1.6, and approximating b fSAW,maxfS c as
fSAW,max
fS









e2aliased,OS,dB = 10log10(pic1)−6N−10log10(OSR)+3 (dBFS) (1.14)
The additional 3 dB term comes from the fact that the power of a full-scale signal is 0.5. The
result in eq. 1.14 is consistent with the expression in eq. 1.10 from the literature because the error
25
power decreases by 6 dB each time the resolution is increased by one bit and by 3 dB each time
the sampling frequency is doubled.
For a fixed signal bandwidth, when the sampling frequency is increased and approaches the
maximum sawtooth frequency, of all the error power that is aliased, very little power falls in band.
As a result, instead of a flat noise-like quantization error floor, the spectrum of an oversampled
sinusoid contains prominent harmonics. The oversampling ADC SNR equation is then not valid.
Aliased error power for a finely-quantized time step
The sampling frequency of a FQT ADC must exceed the maximum sawtooth frequency in order
to capture every level crossing of the input; fS is set to accommodate the limiting case of the
fastest input fMAX, AMAX). The in-band error power corresponds to quantization error harmonics
in the roll-off portion of the spectrum in Fig. 1.14 that are aliased by mixing with harmonics of
the sampling frequency because the sampling frequency is higher than fSAW,max for the particular
input signal. The expressions derived in this section are valid when the bandwidth is much smaller
than the sampling frequency such that none of the error from the flat portion of the CT quantization
error spectrum is aliased in-band. The aliased error power can be expressed as the infinite sum of
the CT quantization error power in a bandwidth of ± fBW centered about the harmonics of fS in







∫ n fS+ fBW
n fS− fBW































If the bandwidth is significantly smaller than the sampling frequency, as is generally the case, the
term in the brackets can be approximated as 2 fBW
(n fS)2

































By using the definition of OSR from eq. 1.6 and using a well-known result for ∑∞n=1
1
n2 , the in-band






















This expression indicates that the aliased in-band error of a FQT ADC decreases by 3 dB with each
additional bit of resolution. When the sampling frequency is doubled and the rest of the parameters
are unchanged, the in-band error power decreases by 6 dB. The extra 3 dB decrease in the error in
this case, relative to the error in an oversampling ADC, is due to the fact that the CT quantization
error is not flat but drops off with a -20 dB per decade slope.
Ref. [11] derives an expression for the total error due to fine time quantization (i.e. aliased
error in a bandwidth fS/2). The derivation is repeated here. The level-crossing times are quantized
with a high-speed sampling clock with a frequency of fS. The time error, δt, in each sample is
uniformly distributed in the range [0,1/ fS] and E{δt2} = 13 f 2S . Ref. [11] suggests that the error in
the amplitude of each sample, ε, can be estimated from the timing error by using the slope of the
signal: ε = dxdt δt. Since the input signal slope and the timing error are independent processes, the
total error power in the sample can be expressed as E{ε2} = E{(dxdt )2}E{δt2}. Since the slope
depends only on the statistics of the input signal, using the above expression for E{δt2} the total





c2 is a proportionality constant which depends only on the statistic of the input signal. This ex-
pression indicates that the total aliased power of a FQT ADC decreases with a -20 dB per decade
slope as the sampling frequency is increased. However, according to the expression in eq. 1.20,
if fBW = 12 fS to include all the aliased error and fS > 2 fSAW,max such that only error in the roll-
off portion of the spectrum is aliased, the total aliased error decreases with a -10 dB per decade
slope, which disagrees with the results in [11]. Simulation results, shown in the next section, are
28
Figure 1.16: Aliased quantization error of a eight-bit FQT ADC for a full-scale sinusoidal input
of fixed frequency (and fixed fSAW,max) versus sampling frequency, normalized to the maximum
sawtooth frequency for several values of bandwidth.
in agreement with eq. 1.20 and show that the derivation in [11] is inaccurate.
Simulation of the error of FQT and oversampling ADCs
Simulation results using Matlab® are used to verify the implications of eq. 1.20. Since the spec-
trum is not uniform and is instead comprised of discrete tones, when the bandwidth is increased
to include an extra term, the aliased power will increase in a small step rather than exhibiting a
smooth transition. This effect is less prominent when the bandwidth includes numerous harmon-
ics, since the addition of a single harmonic does not significantly vary the in-band error power.
It is also important to note that the aliased error power does not include the error power caused
by non-aliased distortion that is due to CT quantization. Therefore, in the presented results, the
power of the in-band harmonics of the input is removed. In a FQT ADC, the total in-band error
is typically dominated by these harmonics and the aliased error is not significant. However, if
advanced reconstruction is used, the quantization harmonics will be significantly reduced and the
aliased error will be modified but will remain.
The aliased in-band error power for a finely quantized-time ADC versus the sampling fre-
29
Figure 1.17: Aliased quantization error of a FQT ADC for a full-scale sinusoidal input versus
sampling frequency, normalized to the input frequency for several resolution values, with fBWfIN =
200.
quency that is normalized to the maximum sawtooth frequency is shown in Fig. 1.16. As predicted
by the oversampling ratio term in eq. 1.20, the in-band error power decreases with a -20 dB per
decade slope for sampling frequencies beyond fSAW,max with the rest of the parameters fixed. Each
time the bandwidth is increased by a factor of four and therefore, OSR is decreased by a factor of
four, the vertical distance between the curves in Fig. 1.16 indicates that the in-band error power
increases by 6 dB, as predicted by eq. 1.20. The variation in in-band aliased power with resolution
is shown in Fig. 1.17; each time the resolution is increased by a single bit, the in-band error power
decreases by 3 dB, as expected.
Fig. 1.18 shows the in-band (bandwidth fBW) and total (bandwidth fS/2) aliased error powers
of a synchronous ADC for sampling frequencies in the range below fSAW,max, corresponding to
an oversampling ADC, and above fSAW,max, corresponding to a finely quantized-time ADC. The
input frequency and bandwidth are fixed, therefore, the variation in error power with frequency is
equivalent to the variation of error power with the oversampling ratio, which is proportional to fS
30
Figure 1.18: (a) In-band ( fBW/ fIN = 50) and (b) total aliased error for sampling frequencies above
and below fSAW,max.
(eq. 1.6). For frequencies below the maximum sawtooth frequency, which is indicated by an arrow
for each resolution setting, the in-band error varies with a slope of -10 dB per decade. Simula-
tion results therefore agree with the behavior predicted by eq. 1.14 and the classical derivation of
quantization error of an oversampled ADC (eq. 1.10), which indicate a -10 dB per decade slope
with respect to OSR. The total error in this sampling frequency range is flat, as seen in Fig. 1.18b.
For a sampling frequency range well above the maximum sawtooth frequency, a slope of -20 dB
per decade is observed in the in-band error power (Fig. 1.18a), which agrees with eq. 1.20, also
expressed in terms of OSR. The total error in this frequency range decreases with a -10 dB per




Continuous-time level-crossing sampling and subsequent digital signal processing discussed in
Ch. 1 result in a large number of samples generated per second. While input-activity-dependent
sampling is efficient during low-activity inputs, for high-activity portions of the input the sampling
rate can be significantly higher than the Nyquist rate. This chapter presents an advance toward
reducing the number of samples generated per unit time, by adjusting resolution according to the
input activity [20, 21]. Slowly varying signals are processed with maximum precision to best
track the signal; whereas, fast deviations in the input are efficiently tracked with larger steps,
thereby reducing the number of tokens produced when appropriate, as suggested in [17]. Without
sacrificing the quality of the output signal, a reduction in power dissipation is achieved.
As was discussed in Sec. 1.3, the quantization error of signals quantized in continuous time
can be partitioned into fast-switching sawtooth-like error segments and slowly-varying bell-shaped
error segments, as illustrated in Fig. 1.9. The fast-switching error contains power primarily at high-
frequency harmonics of the input frequency which are outsider the band of interest for moderate-
to high- resolution. Slowly varying error, on the other had, causes primarily low-frequency har-
32
Figure 2.1: Generalized block diagram of the proposed variable-resolution CT ADC.
monics and is therefore primarily responsible for the in-band error. It is possible to take advantage
of the spectral properties of CT quantization error by increasing the power of the out-of-band har-
monics while maintaining low-frequency in-band harmonics fixed. This approach is detailed. Fast
portions of the input, which contribute out-of-band sawtooth error, can be quantized with less reso-
lution, without degrading the in-band performance; while the total mean square error of the signal
increases due to larger high-frequency error, the in-band SNDR does not suffer. In the proposed
variable resolution (VR) quantizer, illustrated in Fig. 2.1, the input activity can be gauged by
measuring the magnitude of the input slope; when the slope surpasses a selected slope threshold,
Sthreshold, which will be discussed in a later section, the quantizer resolution is decreased.
2.1 Potential variable-resolution quantizer schemes
There are several requirements that must be fulfilled to implement a variable-resolution (VR) quan-
tizer. It is desirable to realize the VR scheme without much additional hardware, to avoid offsetting
the benefits of the proposed scheme. Also, in order to prevent an increase in low-frequency dis-
tortion, a symmetric VR transfer characteristic is sought. The transfer characteristics of potential
mid-tread and mid-rise VR quantizers are illustrated in Fig. 2.2a and b, respectively. In the case
of a mid-tread quantizer, additional thresholds, which are indicated by triangular markers, are re-
quired to reduce the resolution by one bit. In the case of a mid-rise quantizer, there are extra output
33
Figure 2.2: Transfer characteristics of 2- and 3- bit variable-resolution (a) mid-tread, and (b) mid-
rise quantizers.
Figure 2.3: (a) Variable-resolution transfer characteristics of a skip-one-step quantizer, (b) exam-
ple output, and (c) quantization error.
levels that the ADC and subsequent system block must be able to represent, which increases the
word length by 1 bit. For both cases, the precision of the ADC must increase by 1 bit compared to
a fixed resolution ADC; the resulting increase in hardware overhead outweighs the benefits of the
VR ADC.
An alternative way to decrease ADC resolution by one bit is to skip every other step before
producing a digital output, thereby doubling the quantization step. An example transfer character-
istic is illustrated in Fig. 2.3a; the resulting lower-resolution transfer characteristic has an upward
DC shift of 12∆. In regions of low resolution, this causes variations in the local mean of the error,
as shown in Fig. 2.3c, where the mean is taken over some fractional interval of the input period
34
and varies at twice the frequency of the input. This effect causes an increase in the low frequency
even distortion components, and significantly degrades the performance of the system.
As seen, the investigated techniques for varying resolution in a binary manner results in a
significant increase of the hardware overhead or in the in-band distortion. An alternative solution,
which overcomes these problem, is presented in the Sec. 2.2.4.
2.2 Proposed variable-resolution quantizer
2.2.1 Principle
A symmetric reduced-resolution transfer characteristic can be achieved by skipping an even num-
ber of steps before producing an output token. A VR quantizer characteristic, which reduces
resolution by skipping two consecutive levels to achieve a larger effective quantization step is pre-
sented in Fig. 2.4. The ADC requires no additional reference thresholds and has no new output
values. This solution adds little to the hardware overhead, as will be discussed in Sec. 2.2.2. Due
to the symmetry of the transfer characteristic, the quantized output for a sinusoidal signal varies
symmetrically about the input, as is illustrated in Fig. 2.4b. This results in a zero-mean quantiza-
tion error, shown in Fig. 2.4c, which does not increase the in-band distortion. Skipping an even
number of steps, thereby increasing the quantization step by an odd step multiples, causes an ef-
fective non-integer drop in resolution. For the case of skipping two steps, ∆ is tripled, and the
resolution is decreased by log2(3) = 1.6 bits. Several resolution settings can be used, each for an
appropriate range of input activity. Varying resolution according to input activity, as suggested
in [17] and [12], is done by comparing the magnitude of the input slope to optimized thresholds,
35
Figure 2.4: Proposed variable-resolution quantization, achieved by skipping two steps. (a) VR
transfer characteristic, (b) example output, and (c) resulting symmetric quantization error.
the proper selection of which is detailed in a later section.
This variable step size approach is similar to those used in Adaptive Differential PCM and
Adaptive ∆-Modulation [22]. The motivation there is to lower the sampling frequency, while
avoiding slope overload by increasing ∆ during fast-changing inputs using prediction algorithms.
The goal of the proposed VR ADC is to reduce the power dissipation and ease hardware timing
requirements of the system by aggressively reducing the resolution for fast inputs, according to the
input slope. This is done without significantly affecting the in-band error since higher-frequency
low-resolution error does not alias into the baseband, in contrast to what would have been the case
with discrete-time adaptive-resolution systems. The rate at which samples are produced likewise
decreases, varying with the signal, similar to the variable sampling frequency of the discrete-time
techniques in [4] and [23]. The result is a system that varies its resolution and sampling rates with
signal activity, without in-band error degradation. An adaptive-threshold oversampled flash ADC
in [24], can likewise lead to a reduction in power dissipation by using fewer levels to represent
the signals. However, instead of adapting to the speed of the signal, the levels are periodically
adapted according to predicted probabilities of the signal levels. Signal statistics are also use to
adjust ADC thresholds in [25]. Other techniques of achieving high resolution with low-resolution
36
ADCs use dithering signals and digital interpolation [26] (at the expense of power dissipation) or
non-real-time reconstruction processing [27]. The proposed VR CT ADC in conjunction with CT
DSP offers not only a significant reduction in processor power dissipation, but also a reduction in
the processor size, as discussed in Sec. 2.2, achieving high in-band SNDR with lower effective
resolution.
2.2.2 Operation
An example in Fig. 2.5 illustrates the operation of the variable-resolution quantizer. Three reso-
lution settings are used, which triple the current quantization step to reduce resolution. The slope
thresholds for this case are chosen for illustrative purposes. Slowly-varying portions of the input,
the slope of which is below any thresholds, are quantized with maximum resolution to best track
the signal. When the slope surpasses a threshold, indicated by the lower dashed line, the step size
and the maximum magnitude of the error triple. The reconstructed quantized output has noticeably
larger steps for fast inputs and finer steps during segments of low activity. During fast portions
of the input, the variable-resolution ADC takes fewer, larger steps to track the input, whereas the
fixed-resolution ADC takes many more, finer steps. The benefits of the proposed ADC are twofold.
First, the proposed ADC generates fewer digital outputs, which leads to significant dynamic power
savings in the follow-up signal processor. Second, because of the larger quantization step, the
quantization levels are traversed less frequently, thereby relaxing the hardware requirements for
keeping up with the input. When the quantization step is increased, the minimum time between
quantization level crossings increases and, hence, the hardware processing time requirement, ap-
proximated using eq. 1.2, is increased.
37
Figure 2.5: Example signals of a variable resolution system with three resolution settings: input
signal, magnitude of the input slope, quantization step, normalized to the minimum step, quantiza-
tion error, normalized to the minimum ∆, and the reconstructed quantized output.
2.2.3 Spectral properties
Consider the proposed variable-resolution quantizer, which increases ∆ by a factor k to reduce the
resolution when the input changes fast. The error caused by the quantizer, illustrated in Fig 2.6,
can be partitioned into two segments: e∆, corresponding to high-resolution and ek∆, corresponding
to low resolution. The amplitude and minimum tooth interval of the low-resolution sawtooth-like
error increase by a factor of k, when compared to the case of the maximum-resolution quantizer,
where k = ∆max∆min . The low-resolution error can be approximated by a sawtooth-like signals with a
variable tooth duration with a minimum width of TSAW,k∆,min and a maximum width of TSAW,k∆,max,
as shown in the figure. The shortest sawtooth duration, TSAW,k∆,min, corresponds to the input seg-
ments of the maximum absolute slope, which causes consecutive quantization levels to be traversed
at the fastest rate, and can be determined from eq. 1.2 using the low-resolution step, k∆, instead
of ∆. The longest low-resolution tooth duration, TSAW,k∆,max corresponds to the input segments
38
Figure 2.6: VR quantization error for a sinusoidal input, partitioned into segments corresponding
to high and lower resolutions.
whose slope magnitude is approximately equal to a threshold slope, STH. For input slopes less than
the threshold, high resolution is used.
The effects of the VR quantizer can be noted in the spectrum of the quantization error for a
sinusoidal input, quantized with the VR maximum 6-bit quantizer with k = 3, shown in Fig. 2.7b,
and compared to the error spectrum for a sinusoid, quantized with fixed 6-bit resolution, shown in
Fig. 2.7a. Since the tooth duration of e3∆ has increased (refer to eq. 1.2), the largest distortion spur,
occurring approximately at the frequency 1TSAW,k∆,min , moves towards the baseband. Due to the in-
crease in the sawtooth error amplitude, the power of the spur increases by approximately 9log2k =
14dB, as can be deduced by using the results from [18], and this is confirmed by the spectra in
Fig. 2.7. Most power of the low-resolution error is found to be distributed in a frequency range




bound by integer multiples of those frequencies. For a sinusoidal input x = AINsin(2pi fINt), the
minimum and maximum sawtooth frequencies are given by the following:
39
Figure 2.7: Spectra of the quantization error in dB, referred to the signal (not shown), for a 1kHz
full-scale sinusoid quantized with (a) a fixed 6-bit quantizer, and (b) a variable-resolution quantizer,












For input slopes below a set threshold, high resolution quantization is used to ensure that low-
frequency error, which falls in the band of interest, remains small. Inputs with slopes above the
threshold are quantized with low resolution and have an increased error power in the band roughly
between TSAW,k∆,min and TSAW,k∆,max, as shown in Fig. 2.7b. The increase in the high-frequency
error and the fact that the low-power low-frequency error is unaffected by variable resolution can
be observed in the figure. Since the increase in the error due to low resolution occurs primarily
outside the band of interest, it does not degrade the in-band SNDR.
40
2.2.4 Setting the slope threshold
The proper slope threshold can be determined by solving eq. 2.2, with the minimum sawtooth
frequency, f3∆,min = 10m fBW, set to be m decades outside the highest frequency, fBW, of the band
of interest. The slope threshold is then:
STH = (10m fBW)(k∆) (2.3)
Multiple resolution settings can be used for various slope ranges, the thresholds of which can be
determined by using eq. 2.3. The factor k, indicating the ratio of the low-resolution to the high-
resolution quantization steps, should be set to odd integer values.
2.3 Variable-resolution ADC in the context of a CT DSP system
Continuous-time delay blocks used in CT DSPs as tap delays instead of clocked registers, can be
implemented, for example, by using a cascade of current-starved inverters. Delay cell architectures
for CT applications are discussed in Sec. 5.2.1. The following discussion is presented in the context
of a delta-encoded CT ADC, but can be extended to any CT ADC architecture. An equivalent ∆-
encoded CT ADC can be implemented using any CT converter, such as a flash ADC, followed
by an LSB change detector. To review delta-encoded signals, one bit, termed change, indicates
that a quantization level has been crossed, and the second bit indicates the direction of the change.
The input signal is reconstructed by adding or subtracting an LSB, as is indicated by the direction
signal, on every pulse of the change signal.
41
Figure 2.8: Variable-resolution system block diagram using a fixed-resolution (FR) quantizer.
2.3.1 Variable-resolution CT DSP
Fig. 2.8 presents a variable-resolution system in the context of a ∆-modulation converter archi-
tecture; however, a VR system is not limited to a ∆-encoded implementation. The VR quantizer
consists of an unaltered fixed-resolution (FR) CT ADC of the maximum desired resolution. A
slope detector determines whether the magnitude of the input slope is larger than a threshold to
indicate whether the resolution should be changed.
The 2-bit ∆-encoded FR ADC output and the slope detector outputs are combined to generate
the VR output token signal, which has two additional ∆-encoded bits to indicate the step size.
The first step size bit, termed step change, indicates that a change in resolution has occurred; the
second step size bit, step direction, indicated the direction of that change. While the subsequent
system blocks need to process a 4-bit token instead of a 2-bit signal, the DSP and DAC easily
accommodate this change and are, thus, primarily unchanged.
To vary resolution, the actual value of the slope is not needed; rather, an indication of whether
the slope is above a threshold is sufficient. The slope detection, therefore, can be realized without
the use of a differentiator; instead, the time between consecutive tokens can be compared to a
controlled delay, TD, to determine if the slope threshold is surpassed. The tunable delay, TD is
set to be equal to the time it takes an input, with a slope equal to the threshold slope, to cross
a quantization level; thus, TD = ∆ST H . A simple realization of the slope detector is presented in
42
Fig. 2.9a, with example signals shown in Fig. 2.9b. The detector is comprised of basic logic gates,
D-type flip flops (DFF), and a tunable CT delay cell, which is a replica of the cells in the DSP.
Since the DSP delays are calibrated, TD can be set, too. The slope detector can also be easily made
to have hysteresis by varying the delay. A DFF, configured as a T flip flop (TFF), converts a pulse-
encoded token signal to a TFF-encoded signal. When the TFF output toggles high, as a result of an
incoming token, it triggers a one-shot circuit, which produces a test pulse with a pulse width of TD.
When the next token arrives, the output of the one-shot is latched by the second DFF. If the token
arrives after the self-timed delay, TD, the magnitude of the input slope is less than the threshold,
and a low value is latched. Alternatively, if the token arrives before the falling edge of the one-shot
output, a high value is latched to indicate the slope is above the threshold. The one shot is then
reset before its test pulse is complete. The slope is thus detected for every other token. There is
no need to measure the slope at every token because slope of the input is not expected to change
drastically between tokens of a bandlimited signal. Even if the exact moment the slope threshold is
crossed is missed, the performance is not degraded because the thresholds are not strictly selected
and can be varied.
The power dissipation and area of several digital gates, which comprise the slope detector,
are negligible compared to those of the CT DSP, which is comprised of several thousand of such
gates, as in the case of [15]; therefore, the slope detector adds insignificantly to the overhead. The
performance of the VR ADC is insensitive to the performance of the slope detector. Since the slope
threshold is chosen conservatively to keep low-resolution distortion well outside the baseband, the
performance of the VR ADC is insensitive to the error of the slope detector. For a 20% error in the
threshold slope, the in-band SNDR of a signal with several harmonics in-band varies by less than
0.5 dB, because high-power error is still maintained far outside the baseband.
43
Figure 2.9: (a) Slope detector block diagram with (b) example waveforms.
2.3.2 Power reduction of a variable-resolution CT DSP
The power dissipation of a CT DSP system differs from that of a conventional digital system in
that the former dissipates signal-activity-dependent dynamic power. From [15], the static power is





The first term in the sum is due to the DC leakage power of the DSP, which is proportional to the
number of delay cells in a tap delay block, TTAPTGRAN , the number of filter taps, NTAP, and the leakage
power of a delay cell, Pleak,Dcell. The second term is due to the DC bias power of the CT ADC,
which is always on to track the input. The latter term is due to the leakage of the remaining digital
blocks of the system. The dynamic power depends on the rate at which tokens are generated,
RTOKEN, and the number of delay cells through which they have to be processed, and is therefore
44





where EDcell is the energy dissipation of a delay cell per token [15].
The DC leakage power is not negligible; for a full-scale input at a quarter of the maximum
frequency, the ADC bias power makes up only 9%, while the leakage power makes up 30% of
the total power dissipation of the system in [15]. The static power dissipation of a CT system is
offset by the power savings of the CT DSP, when compared with a conventional system, which
keeps sampling when the input is inactive. The focus of this work is a comparison of the proposed
variable-resolution CT DSP system to a conventional fixed-resolution CT DSP system.
The proposed VR ADC is presented in the context of a CT digital system. The power and
area reduction, when compared to a FR system, are due to the DSP, since the ADC is primar-
ily unchanged, except for the negligible contribution of the slope detector and a few logic gates.
Since the DSP dominates the power consumption and area of a CT system, the VR scheme offers
significant reductions in both.
Consider the comparison of a FR and VR quantizers in Fig. 2.10. By increasing ∆ for fast
inputs, the VR ADC generates fewer tokens, and the minimum temporal spacing between tokens
is also increased, thus easing the speed requirements of the hardware. When ∆ is multiplied by k
during fast portions of the input, the granularity time of a VR system, TGRAN,VR, increases by the
same factor. As a result, one delay cell with a longer delay TGRAN,VR = kTGRAN,FR can replace
k more granular delay cell inside a tap delay block (refer to Fig. 1.6). Thus, the total number of
delay cells that comprise the VR DSP can be decreased by a factor 1k , which results in a decrease
45
Figure 2.10: Comparison of the quantization error, tokens, and delay cells for a fixed-resolution
(left) and a variable-resolution (right) systems.




The decrease in the number of delay cells and tokens is reflected in a decrease in the power
dissipation when compared to a FR CT DSP. While the bias power of the CT ADC and the leakage
power of the digital blocks are primarily unchanged, using eq. 2.4, the leakage power of the DSP,
the main contributor, decreases by a factor PLEAKAGE,VRPLEAKAGE,FR =
2
k . The reason for a reduction of the
dynamic power dissipation is twofold. First, because a larger ∆ is used to track fast portions of
the input, fewer tokens are generated. Second, these fewer tokens are processed through fewer
delay cells. There are further dynamic power savings because the additional token bits dissipate
dynamic power only when the resolution is changed, which occurs a lot less frequently than the
input’s crossing quantization levels. Using eq. 2.5 and expressing the relaxed granularity constraint











where RTOK,FR and RTOK,VR are the rates of token production of the fixed-resolution and variable-
resolution quantizers, respectively. Eq. 2.6 gives, therefore, a conservative upper bound of the
power reduction achieved by the variable-resolution system.
2.3.3 Token rate estimation
The rate at which the variable-resolution CT ADC produces new digital outputs, or token rate
RTOKEN, is an indication of the dynamic power dissipation of the ADC, and subsequent system
blocks. An estimate of this token rate is now developed to help predict the power dissipation.
Consider a fixed resolution quantizer with a quantization step ∆, and a sinusoidal input x(t) =
AINsin(2pi fINt). The number of tokens produced in each cycle is twice the number of quantization
steps that fit into the input signal range, or 4AIN∆ . The number of tokens produced each second is
simply:
RTOKEN =






For general waveforms, the arithmetic mean of the magnitude of the input slope can be used
to estimate the average token rate. The average time it takes to cross a quantization step can be






For the case of a sinusoidal input x(t) = AINsin(2pi fINt), the mean of the magnitude of the slope is
4A fIN . Using this slope in (2.8), the result in (2.7) is obtained; the approximation of the token rate
in (2.8) is, therefore, exact for a sinusoidal input.
47
For a variable-resolution quantizer with M resolution settings, there is a mean absolute slope,
sm, corresponding to the mth setting, which is weighted by the fraction of time spend in that setting.
The VR token rate can be approximated as the sum of the token rates at each resolution setting m








The token rate can also be determined for any signal for which its statistics are known. Ex-
pected values of each sm, determined from the probability density function of the magnitude of
slope for each resolution level, can be used in eq. 2.9 to approximate the token rate. For example,
consider a bandlimited input signal with a Gaussian distribution and a spectral density Sx( f ). Since
the input signal has a Gaussian distribution, the slope of the signal also has a Gaussian distribution,




(2pi f )2Sx( f )d f
The distribution of the magnitude of the slope, φ|dx/dt|, has a value of zero for negative values
and is equal to 2φdx/dt for positive values of the slope. One of the basic properties of a Gaussian
distribution is that values within the range of ±σ occur 68% of the time. Values in the range ±S,
where S is the average magnitude of the signal, occur 50% of the time. One can expect, then, that
the average value of the magnitude of the slope will be less than but proportional to the standard
deviation of the slope. For a variable y with a standard normal distribution φ0,1(y) with a mean






yφ0,1(y)dy = 0.798 (2.10)
For a Gaussian distribution, a standard deviation other that of 1 causes the bell-shaped distribution
to become narrower or broader, but the characteristics are unchanged, just scaled. 68% of the
sample values of the variable y are still within ±σ of the mean and 50% of the values are within
±0.798σ of the mean. The average value of the slope magnitude is therefore |dx/dt|= 0.798σdx/dt .
The average magnitude of the slope for each resolution setting m which is enabled for input slopes





where the limits are normalized by the standard deviation of the slope such so that the standard
normal distribution φ(y) can be used. The average slopes are then used in eq. 2.9. For the lowest
resolution setting ST H,m−1 = 0.
2.4 Results
To evaluate the advantages of a variable-resolution ADC in the context of a CT processor, its
performance was compared, using Matlab®, to that of a fixed-resolution CT quantizer, in terms of
in-band SNDR and the number of tokens produced per second. The latter is an indication of the
dynamic power dissipation of a CT DSP system and is used to illustrate the improved efficiency
of the proposed scheme. The power saving due to fewer delay cells, in the case of a CT DSP
49
Figure 2.11: SNDR for a small range of amplitudes of a 400 Hz sinusoid for a maximum 8-
bit variabe-resolution, an 8-bit fixed-resolution quantizer and a 7-bit fixed resolution quantizer.
SNDR is calculated in a 3.6 kHz voice bandwidth, with quantizer slope threshold selected to keep
the sawtooth frequency a decade away from the band of interest.
application, are discussion in Sec. 2.4.5.
2.4.1 Properties of the in-band SNDR
A full range of frequencies for which harmonics fall in-band must be tested because the SNDR of
a CT DSP system varies with frequency. For a high input frequency with only a few harmonics in-
band, a very high SNDR is achievable. With many harmonics in-band, as for a low-frequency input,
the SNDR is lower. The SNDR can change significantly for very small variations in amplitude,
less than an LSB, as is shown in Fig. 2.11, because when few harmonics are in-band, the in-band
quantization error is due to the error at the input peaks. This is explained in Sec. 1.3. If the
input amplitude is slightly changed, the size of the bell-shaped error drastically changes, which
is reflected in the SNDR swing. This is not a useful feature because while high SNDR values
are possible, they are accompanied by lower SNDR’s for very small amplitude variation. It is
50
thus acceptable to reduce the peaks as long as the valleys are maintained. This is precisely what is
accomplished by using a variable-bit quantizer, as seen in the variable-bit curve in Fig. 2.11. While
the peaks are not as high, the valleys of the VR quantizer are seen to be practically the same as
in the case of the fixed 8-bit case, and better than those for a fixed 7-bit case. Without sacrificing
SNDR, the VR ADC generates 57% fewer tokens than the fixed 8-bit ADC.
2.4.2 Single-tone input
The VR quantizer was compared to a FR counterpart using a sinusoidal input in the voice band,
with the comparison performed over a full range of amplitudes and frequencies which contain
harmonics in band. The reader is referred to Fig. 2.12. Three resolution settings, with quantization
steps of ∆, 3∆, and 9∆, are used; the slope thresholds are chosen such that the high-power distortion
lies at least a decade away from the audible baseband edge at 20kHz. In the case of a fixed-
resolution quantizer (Fig. 2.12a), for high-frequency inputs with few harmonics in-band, the SNDR
can vary by as much as 50 dB. When fin is lowered, more harmonics are pushed in-band, and the
SNDR variation range shrinks. For the VR system, shown in Fig. 2.12b, there are more kinks in
the SNDR, but the range of variation is smaller. While the highs are not reached, the envelope of
the minima is practically the same as in the FR case, indicating that SNDR is virtually the same
for all practical purposes.
The benefits of variable resolution are realized in terms of the average number of data tokens
produced per unit time. The number of tokens produced is drastically less in the case of the
variable-resolution quantizer, as seen in the Fig. 2.13; token savings of over 80% are possible.
Further power savings are possible due to a decrease in the number of delay cells used to realize a
51
Figure 2.12: SNDR in a 3.6 kHz bandwidth vs. amplitude vs. frequency for (a) a fixed-resolution
8-bit ADC, and (b) variable-resolution ADC of maximum 8-bit resolution.
Figure 2.13: Number of tokens produced per second for a (a) fixed-resolution, and (b) variable-
resolution quantizers.
CT DSP, as explained in Sec. 2.3.2.
2.4.3 Two-tone test
Fig. 2.14 shows the spectra of the outputs of fixed- and variable-bit quantizers for a two-tone input
of 400 Hz and 405 Hz. The spectrum of the VR systems shows an increase of high-frequency dis-
tortion due to the low resolution error. The high-power error, however, does not cause an increase
in the intermodulation products that fall in-band. There is, therefore little difference between the
in-band SNDR of the two ADCs, yet the VR quantizer produces 25% fewer tokens.
52
Figure 2.14: Spectrum of a quantized two-tone spectrum for (a-b) a fixed 6-bit quantizer, and (c-d)
variable-bit quantizer.
2.4.4 Voice signal example
A variable-resolution CT system is optimal for burst-like signals, such as speech, an example of
which is illustrated in Fig. 2.15. During input nulls, maximum 6-bit resolution is used. As the
signal slope increases, the quantization step increases. Due to the fact that the signal includes
periods of low activity, the average token rate for the segment shown was only 4 kTokens/s, which
is about half what would have been required if uniform sampling had been used. Simulations show
that a speech signal quantized with variable resolution has an in-band SNDR degradation of less
than 0.5 dB while using 35% less tokens than a fixed-resolution quantizer.
2.4.5 CT DSP example
The performance of an example VR CT DSP can compared to that of a conventional FR CT DSP,
with the comparison based on the parameters of the system in [15]. A speech signal is processed
through the two DSPs, configured as a 16-tap low-pass filters as an example. High-frequency
content of the speech signal is noticeably attenuated by both system, as shown in Fig. 2.16. The
output of both systems appears essentially the same, with no increase of in-band error due to
53
Figure 2.15: Example waveforms of a VR quantizer: (a) speech input, (b) magnitude of the input
slope, (c) quantization step, normalized to the minimum step, and (d) reconstructed quantized
output.
Figure 2.16: Speech signals, low-pass filtered by a FR CT DSP and a VR CT DSP.
54
variable resolution. The token rate of the FR system is 170 kTokens/s, while the token rate of the
VR system is 70 kTokens/s. The number of operations per second, therefore, decreases by 60%.
The static power dissipation of the ADCs and digital blocks of both system are primarily the same;
the leakage power of the VR DSP decreases by a factor of 29 due to fewer delay cells in the VR
CT DSP. The static power of the overall VR system is 35% lower than that of the FR system.
Using eq. 2.5, the dynamic power dissipation of the proposed system is decreased by 90% due to
a decrease in the number of tokens that have to be processed through fewer delay cells. The total
power dissipation of the VR CT system is 45% lower than that of a conventional CT system, while
the performance of the system, in terms of in-band SNDR, is unaffected. The design of hardware
is also made easier because the maximum allowed response time of system blocks is increased by
a factor of 9. The area of the DSP is also expected to decrease by approximately 29 because fewer
delay cells are needed.
55
Chapter 3
Comparison of Signal-Driven Sampling
Techniques
Signal-driven sampling techniques have been demonstrated to be advantageous for a class of sig-
nals which are burst-like or sporadically-changing in behavior. This chapter compares several sam-
pling techniques for asynchronous ADCs, the outputs of which can be processed, reconstructed,
transmitted or even stored. While in some systems, a receiver or post-processor can afford to have
high-order reconstruction because a lot of power can be available for computations, in many real-
time applications, however, the power dissipation and processor complexity for such reconstruction
are prohibitive and zero-order-hold (ZOH) is used instead. Sec. 3.1 discusses several asynchronous
sampling techniques, which, for a fair comparison and compatibility to real-time applications, are
assumed to be reconstructed using ZOH, unless otherwise specified. Special cases of higher-order
reconstruction are considered in Sec. 3.2, where the sampling techniques are compared. While
some asynchronous ADCs function entirely in continuous time (CT), others can use a clock with a
very fine time-step to quantize the time between samples in order to approximate CT behavior as
56
Figure 3.1: Generalized body diagram of a CT ADC. Sampling criterion realized by using an error
conditioner, f (), and threshold δ that can be adjusted according to properties of the input x(t), a
CT quantizer, a digital signal encoder and a reconstruction block.
is discussed in Sec. 3.2.
3.1 Signal-driven sampling schemes
A continuous-time sampling system receives an input signal, x(t), with a normalized amplitude
range of ±AMAX, and produces a sampled digital output, yD(t), which has an equivalent analog
representation y(t). CT samples are produced when some measure of the error, e(t) = x(t)− y(t),
surpasses a threshold δ; samples are generated according to the properties of the signal as opposed
to according to the timing of a clock. The sampling criteria considered in this work are based on
the magnitude and the integral of the absolute error; however other criteria are possible, such as
the energy of the error [28].
A plethora of choices is available to asynchronous ADCs in terms of the sampling criteria,
adaptable sampling thresholds and predictive algorithms. A generalized block diagram of a CT
sampler, illustrated in Fig. 3.1, is comprised of an error conditioner and an adjustable thresholder
that dictates when the signal should be sampled by a CT quantizer. The output of the quantizer is
encoded to a binary, delta or any other digital representation for transmission to the next block. A
57
DAC reconstructs y(t) from the digital output by using ZOH or higher-order reconstruction, with
the possibility of using predictive algorithms. The most basic exemplary combinations from the
aforementioned categories are discussed in this work to highlight the properties of each possibility.
With the basic properties in mind, an appropriate combination can be selected to accommodate
an applications requirements like a power budget, implementation complexity, sampling rate and
SNDR. The important metrics for comparing the various signal-driven sampling techniques are
SNDR and the average effective sampling rate.
The magnitude of error sampling criterion is equivalent to CT quantization because a new
sample is produced whenever the input changes by a quantization step. The variable-resolution
quantization described in Ch. 2 is a case of the magnitude of error sampling criterion, where the
error threshold, which is the quantization step, is varied according to the slope of the input. This
technique is included in the comparison. A variable error threshold approach can also be applied
to other sampling criteria. The signal can be approximated with a ZOH representation or can
be tracked more accurately by using prediction; the latter enables the output to be updated less
frequently. Linear prediction is demonstrated for an ADC using the magnitude sampling criterion,
but this approach can be extended to other sampling criteria and to higher-order prediction. The
order of the reconstruction at the final decoder, be it a real-time processor or an offline post-
processor, can be greater than or equal to the reconstruction order used inside the ADC.
58
Figure 3.2: Input signal, sampler output and quantization error of 3-bit (a) CT ADC or CT Quan-
tizer, (b) level-crossing or send-on-delta sampler and (c) hysteresis quantizer.
3.1.1 Magnitude of error sampling criterion: Differences in CT quantiza-
tion techniques
The properties of signals sampled using the magnitude of error sampling criterion, which is equiv-
alent to CT quantization, are described in Sec. 1.3 and the token rate for this sampler is discussed
in 2.3.3. This section compares the various realizations of CT quantizers. Over the years, quan-
tization in continuous time (or in finely-quantized time FQT) has been referred to as CT A-to-D
conversion [13], level-crossing sampling (LCS) [5, 7, 8, 11, 12], send-on-delta sampling [29], and
quantized state sampling [30]. The subtle differences between the techniques are illustrated in
Fig. 3.2. In a CT ADC, which is equivalent to an ideal CT quantizer, when the threshold k∆ is
crossed, the sampled signal is given the value of k∆± 12∆ until the next sampling instant; the po-
larity of the sign depends on the direction in which the threshold is crossed. This ensures that the
amplitude of the quantization error is within ±12∆, symmetrically about zero.
In level-crossing sampling and send-on-delta sampling with ZOH reconstruction, which are a
variation on CT quantization, a sampled output assumes the value of the most-recently crossed
threshold until the next threshold is crossed [7, 11, 12]. This results in a hysteretic quantizer that
has a transfer characteristic with a−12∆DC shift and a +12∆DC shift for increasing and decreasing
59
inputs, respectively. The quantization error, correspondingly, has a non-zero local mean of −12∆
and +12∆ for increasing and decreasing portions of the input, respectively, where the local mean
is defined for some fraction of the input period. This hysteretic behavior leads to a significant
increase in low-frequency distortion. For a sinusoidal input, the error can be approximated as the
sum of the error resulting from hysteresis-free quantization and a square wave, which varies with
the frequency of the input and has an amplitude of ±12∆. The square wave can be represented as
a Fourier series, with non-zero coefficients 2∆pin for odd harmonics, where n is the index of the har-
monic. The amplitude of the harmonics caused by the hysteretic behavior is given by the Fourier
coefficients because the power of the square wave dominates at low frequencies. The largest dis-
tortion spur occurs at the third harmonic, which falls in-band; the third harmonic has a power of
10log10(
4∆2
9pi2 ) =−7.45−6N dB relative to the power of a full-scale fundamental component of the
quantized signal, where N = log2(2/∆). The resulting spurious-free dynamic range (SFDR) is sev-
eral dB lower than the SFDR achieved by a hysteresis-free quantizer. The spurious-free dynamic
range (SFDR) is defined as the ratio of the power at the fundamental component for a full-scale
input to the power of the highest error spur that falls in-band. Since the third harmonic is the
dominant low-frequency harmonic and is typically in band, it limits the SFDR.
Quantized state systems [30] sample the input by offsetting the thresholds for increasing inputs
from the thresholds used for decreasing inputs by a value ε, which is typically close to but is slightly
less than ∆. This also produces a hysteretic transfer characteristic, with a quantization error that
is the sum of hysteresis-free quantization error and a square wave that varies with the frequency
of the input and has an amplitude ±12ε. The maximum distortions spur, which again occurs at the
third harmonic, has a power of 10log10(
4ε2
9pi2 ) =−13.5−10log10(ε) dB relative to the power of the
fundamental component of the quantized signal.
60
Real implementations of a CT ADC also offset the increasing- and decreasing- input thresholds
but by a small fraction of ∆ to prevent multiple samples from being generated when an input
bounces near a threshold due to noise [15]. The decrease in the SFDR and SNDR due to this small
offset, which is typically a fraction of the quantization step (also smaller than ε), is significantly
less than that due to the hysteresis of a level-crossing sampler and a quantized-state sampler. It
should be noted that an offset only in the threshold that has most-recently been crossed is needed
to prevent a false-triggering. The other threshold, above the signal for an increasing input and
below the signal for a decreasing input, can be equal exactly to the quantization threshold without
an offset. As a result, the increase in low-frequency distortion can be made virtually negligible
while the false level crossings are still avoided. For applications where high-order reconstruction
is not viable or digital processing is performed prior to high-order interpolation, the CT ADC
achieves better performance than the hysteretic techniques.
3.1.2 Sampling criterion based on the integral of absolute error
Instead of sampling when the peak of the error surpasses a threshold, y(t) can be updated at t =






|x(t)− x(ti)|dt = δ (3.1)
where ti is the time x(t) was last sampled (refer to Fig. 3.3). The sampled input is assumed to
be known with high accuracy by quantizing it with adequately-high resolution of NS bits; thus
y(ti) ≈ x(ti). This sampling technique puts emphasis on the accumulated error rather than the
peak error; as a result, the error, eδ(t), shown in Fig. 3.3c, is comprised of sawtooth-like segments
61
Figure 3.3: (a) Block diagram of an integral criterion sampler, (b) example input and sampled
output, (c) the error waveform and (d) integral of the absolute error for δ= 0.005 and an arbitrary
input frequency of 1 Hz.
with varying peak amplitudes, yet the peak integrated error for each sample is constant. The next
sample can be of any value, which makes a predictive correction of ±12∆ as in the case of a CT
quantizer not possible; the value of the sample is held until the next sample. The local mean of
the error changes depending on the polarity of the input, similarly to a hysteretic quantizer, and
causes higher-power low-frequency distortion, which limits the SFDR as will be discussed in a
later section.
Since the integral of the error can reach the threshold at any time, the sampling instants do not
correspond to quantization-level-crossing times. An implementation of this sampling scheme is
more complicated than that of a CT quantizer because it requires a high-resolution quantizer and an
integrator. The NS-bit quantizer does not need to track the signal but just needs to produce a digital
estimated upon a sampling instant. Any CT or synchronous ADC, including flash, pipelined, SAR,
etc, can be used as long as the clock is replaced by the asynchronous sampling-indicator signal.
In the case of a pipelined ADC, self-timed logic, triggered by the asynchronous sampling-instant
62
Figure 3.4: Spectrum of a sinusoid sampled using the integral of error criterion with δ= 0.02 fIN
and NS=10. Subharmonics exist, but are well below the level of the harmonics.
indicator, can produce the necessary timing signals.
Spectral properties
Since the sampling instants are dictated asynchronously by the signal as opposed to by a clock,
there is no aliasing. The alias-free spectrum of a sampled sinusoid, shown in Fig. 3.4, contains not
only tones at the input frequency and harmonics, but also subharmonics. These subharmonics are
created for two reasons. Firstly, the sampling instants typically occur at different signal amplitudes
from cycle to cycle because sampling times depend not on the immediate amplitude of the signal
but on an accumulated property. Secondly, the sampling instants typically do not coincide with
the times the input has theoretically crossed a level of the NS-bit quantizer. The timing difference
between the level crossing and the actual sampling time vary from cycle to cycle. The minimum
subharmonic can, therefore, approach the zero frequency, which causes subharmonics to appear
as a very low noise floor, which is well below the level of the harmonics. The error is dominated
by the harmonics of the input as in the case of a CT quantizer. These properties are exhibited by
a signal sampled asynchronously according to any signal-driven sampling criterion other than the
magnitude-of-error, such as integral or energy of the error or the extrapolated error.
63
Properties of signals sampled according to the integral criterion
The majority of the error power is contained in the frequency range up to the sawtooth frequency,
fSAW,δ, as shown in Fig. 3.4, which corresponds to the minimum-duration maximum-amplitude
sawtooth segment occurring at fastest portions of an input. Assuming the slope of the input be-
tween consecutive sampling instants does not change significantly, a truncated Taylor series of the
input, including a constant and a linear term, can be used to approximate the input signal time ti
until the next sample:
xˆ(t) = x(ti)+ x˙(ti)(t− ti) (3.2)
This expression can be used to evaluate the integral in eq. 3.1, 12 |x˙(ti)|∆t2i = δ. The time between











For a sinusoidal input, fSAW,δ =
√
piAIN fIN/δ, where AIN is the amplitude and fIN is the frequency
of the sinusoid.
The rate at which samples are generated is derived in [32], as is repeated here. The average
sampling rate in an interval [tA, tB] is defined as the inverse of the average interval between samples
(eq. 3.3) in that interval. It is assumed that the slope of the signal does not change significantly
in the interval between consecutive samples so that input can be expressed as the truncated Taylor
series in eq. 3.2. Eq. 3.3 can be rewritten as ∆ti
√|x˙(ti)|=√2δ. If in the indicated time interval M
64
samples are generated, the sum of ∆ti
















with the summation approximated as an integral, which is valid for short intervals between consec-




















As the input frequency is increased, the rate of samples, which varies with the squared root of the
frequency, increases less than the rate of a CT quantizer, in eq. 2.7.
The sampling error shown in Fig. 3.4b is positive or negative depending on the polarity of the
input slope and, as a result, has a slowly-varying non-zero mean. Since this variable local mean
changes with the signal, it causes an increase in low-frequency distortion. The aim of the following
discussion is to approximate the distortion power at low frequency harmonics and the total error
65
power in order to determine the spurious-free dynamic range and the total SNDR of this sampling
scheme, respectively. The error signal is comprised of sawtooth-like error segments, which can be
approximated as ei(t) = x˙(ti)(t− ti) by expressing the input signal as a truncated Taylor series in







which is equal to δ∆ti since the integral of the error between samples is equal to the error threshold,






Since the local mean of the error is approximately periodic, it can be represented with a Fourier













Integration can be performed over only a quarter of an input cycle due to the approximate quarter-
wave symmetry of the error. If M samples are generated during the integration interval TIN/4,
the coefficient can be determined from the contribution of each error segment. The segments can
66
be further approximated with their average values because the fast-switch sawtooth-like variation
about the average value causes high-frequency distortion and is not of interest. The average value

















The summation can be approximated with an integral because the integrand does not change sig-
nificantly in the time ∆ti . After substituting in the slope for the aforementioned sinusoidal signal,
x˙(t) = 2pi fINAINcos(2pi fINt), and by performing a change of variable θ= 2pi fINt, the equation can










The expression in the brackets is a complete elliptic integral of the first kind which evaluates to
0.874 and −0.125 for n = 1 and n = 3, respectively. The magnitude of the Fourier component at







The third error harmonic, which typically falls in-band, is the dominant in-band harmonic (not
67
counting the error component at fIN), and it limits the SFDR. The power of the fundamental com-





if there is a sufficient number of samples to represent it. The SFDR for this case is given by the












The linear approximation in eq. 3.2 can also be used to estimate the mean squared error, EMS,δ,























































3(tB− tA)[x(tA)− x(tB)] (3.14)
where the last expression is valid if the input does not change directions in the interval tB− tA. For
signals in an interval of interest with multiple changes in direction, the interval can be partitioned
into several segments; EMS,δ is the sum of the mean squared error of each segment. The total mean





because a sinusoid goes through a transition of 2AIN in half an input cycle, 1/(2 fIN). The mean
squared error gives the total error power including the component at the fundamental frequency,
which is not negligible, as in the case of CT quantization without prediction. The total SNDR of
the sampled signal can be calculated from the total error power (eq. 3.15) with the error power at












where power of a full-scale sampled sinusoid is given by eq. 3.12. As would be expected, if the
threshold δ of the accumulated error is lowered, the error power decreases.
69
Figure 3.5: Simulated in-band and total SNDR and calculated total SNDR and in-band SFDR for
an integral criterion sampler with δ= 0.01 vs. frequency for no input quantization and an arbitrary
bandwidth of 1 Hz.
The total SNDR and in-band SNDR are shown in Fig. 3.5 and compared to the calculated
total SNDR and in-band SFDR from eq. 3.16 and eq. 3.13. The total SNDR is not constant with
frequency as in the case of a CT quantizer, but decreases with increasing frequency at a rate of -10
dB per decade, as shown in simulated and calculated results. Qualitatively, this can be explained
the following way. If the input frequency is increased, the time over which the integrated error
reaches the threshold, which is constant, decreases, therefore the peaks of the error increase. The
in-band error can be determined with the help of simulations. Since only a fraction of the error
power is in-band, the in-band SNDR is lower than the total SNDR, as is shown in Fig. 3.5. As the
input frequency is increased and fewer harmonics are in-band, the in-band SNDR approaches the
calculated SFDR, which is limited only by the third harmonic.
In a real implementation, the samples cannot be known exactly but have to be quantized with
some finite resolution of NS bits, the effect of which are shown in Fig. 3.6. As the input frequency
is lowered, the in-band SNDR increases until the finite resolution begins to limit the performance
and the SNDR approaches 6NS+1.76 dB.
70
Figure 3.6: In-band and total SNDR for an integral criterion sampler with δ= 10−5 vs. frequency
for no input quantization and finite quantization resolution ranging from 11 to 9 bits and an arbi-
trary bandwidth of 1 Hz.
3.1.3 Sampling rate reduction techniques
CT sampling techniques are well suited for applications with sporadically-changing signals be-
cause they automatically adjust the sampling rate according to input activity. A large number of
samples, however, can still be generated during bursts or fast-changing large deviations in the in-
put. Several techniques have been proposed to reduce the effective sampling rate while maintaining
a desirable level of performance. Reduced sampling rates can lead not only to a reduction in the
power dissipation in the blocks following the ADC but can also relax hardware speed constraints.
In the case of CT quantization, with sampling based on the magnitude of the error criterion, [17]
and [12] have noted that taking fewer larger steps instead of many small steps to track large fast-
changing deviations of the input does not degrade the performance. This observation stems from
the fact that the narrow sawtooth-like error at fast portions of the input contributes power mainly
to the higher frequency error spectrum, and can thus be increased without much effect on the low-
frequency spectrum [21, 33]. The variable-resolution quantization technique described in Ch.2
takes advantage of these spectral properties by quantizing fast portions of the input, which con-
71
tribute to out-of-band error power, with less resolution without effecting the in-band spectrum,
which leads to a drastic reduction in the token rate. An alternative way to reduce the token rate by
using prediction is described in the next section.
Continuous-time quantization with prediction
In CT ADCs, the signal is resampled whenever the difference between the input and the approxi-
mation of it, y(t), surpasses a threshold ∆. The rate at which y(t) must be corrected and updated
can be reduced by using a better approximation of the input instead of ZOH reconstruction. [34]
proposes an asynchronous ADC with first-order linear prediction, realized using a clock with a
fine timestep. For the sake of an equivalent comparison to other techniques, a CT implementation
will be considered but it can be approximated with a finely quantized-time implementation. The
CT ADC with linear prediction, shown in Fig. 3.7, estimates the slope of x(t) at each sampling
instant ti and extrapolates the sampled output as y(t) = x(ti)+ ˆ˙x(ti)(t− ti), where ˆ˙x(ti) is the esti-
mated slope and xˆ(ti) is the sampled signal value at ti. The input is assumed to be sampled with
high NS-bit resolution using a quantization step ∆S ∆, such that xˆ(ti) ≈ x(ti). The slope can be
measured and its value quantized, or the slope can be approximated, for example, as ˆ˙x(ti) = n∆sτ
by counting the number n of the high-resolution quantization steps that the input traversed during
a controlled time interval τ. y(t) can then be extrapolating by incrementing or decrementing yD(t)
at the measured rate. The sampling criterion for each sample i+ 1 at time t can the be described
with the following expression [34]:
|x(t)− y(t)|= x(t)− [xˆ(ti)+ ˆ˙x(ti)(t− ti)] = ∆ (3.17)
72
Figure 3.7: (a) Block diagram of a CT Quantizer with linear prediction, (b) example input, and
sampled output and (c) the error waveform N = 5 and NS = 14, normalized to ∆.
A second-order truncated Taylor series is used to approximate the input signal:
xˆ(t) = x(ti)+ x˙(ti)(t− ti)+ 12 x¨(ti)(t− ti)
2 (3.18)
It is assumed that the quantizer and slope estimator are of high resolution such that xˆ(ti) ≈ x(ti)





The sawtooth-like error segments in Fig. 3.7b are not comprised of approximately straight lines
as in the case of a CT quantizer without prediction, but have quadratic curvature, as indicated by
eq. 3.19. Each error segment i can be approximated as ei(t) = c(t − ti)2 for the duration of the
segment with c = ± ∆
(∆ti)2
such that ei(ti) = 0 and ei(ti +∆ti) = ±∆ to match the initial and final
values of the actual error segments. The error shown in Fig. 3.7b has portions which are greater
than zero and less than zero for portions of the input with positive and negative curvatures, respec-
tively. For a periodic input, this causes an increase in low-frequency distortion, as in the case of
73
hysteretic quantizers and the integral criterion ADC because the error does not vary symmetrically
about zero but has a local mean value, defined for a portion of the input period, that varies with the
signal. The local mean value of the error for positive and for negative portions can be determined
from the average value of each error segment ei (of the appropriate polarity), which have a duration






















Since each segment of the same polarity has the same average value, independent of its duration,
the local mean value of the error is ±∆3 , where the sign depends on the error polarity. The error
between the input and the sampled signal can then be represented as a fast-switching zero-mean
error and a zero-mean square wave of the amplitude 13∆ that switches with the frequency of the
input signal. During changes in curvature of the input signal (e.g. zero crossings of a sinusoid),
the local mean has transitions of non-zero duration, however it is approximated as a square wave
with ideally sharp transitions. The Fourier series coefficients of a square wave can once again be







where n is the index of the harmonic. This expression is valid for several low-frequency harmon-
ics of the input because the error power due to the fast-switching non-linear sawtooth-like error
74
segments is negligible. For higher-frequency harmonics, on the other hand, the power due to the
sawtooth-like error dominates. The dominant low-frequency harmonic, that is not the fundamental





The mean squared error of the sampled signal can be determined as the sum for the mean
squared error of each segment of eLP(t), where each error segment i is approximated as ei(t) =
∆
(∆ti)2



































The mean squared error includes the power of the fundamental, and has to be subtracted in order to
determine the total SNDR. The amplitude of the fundamental can be determined by using eq. 3.23





The power of the fundamental component of the output signal, rather than of the error, is
approximately equal to the power of the input signal, assuming a sufficient number of samples is





and assuming 0.5a21 x2. The total SNDR of the quantizer with linear prediction for a full-scale
75





















where N = log2(2AMAX/∆). Since the third harmonic is the dominant low-frequency harmonic
and is typically in band, it limits the SFDR. By using eq. 3.24 and eq. 3.27 with AIN = AMAX, the






= 11+6N (dB) (3.29)
Instead of FOH, higher-order reconstruction techniques can be used in the receiver or post-processor
in order to reduce the low-frequency distortion at the cost of power and circuit complexity.
Fig. 3.8 shows the in-band and total SNDR of a quantizer with ideal linear prediction (NSN)
for several values of N using Matlab® simulation. As the input frequency is decreased and more
harmonics are pushed in-band, the in-band SNDR approaches the total SNDR, which matches the
calculated SNDR of eq. 3.28. A noticeable kink in each in-band SNDR waveform occurs at a
frequency that corresponds approximately to the inverse of the minimum duration between sam-
ples, which is given in eq. 3.31 in the following discussion. Since the majority of the quantization
error is contained up to this frequency, once this frequency falls in-band as fIN is decreased, the
in-band SNDR flattens out. On the other hand, as the input frequency is increased and only a few
harmonics are in-band, the in-band SNDR, now dominated by the third harmonic, approaches the
76
Figure 3.8: In-band (solid) and total (dashed) SNDR for a quantizer with linear prediction with
N=10, 8 and 6, NS N using simulation results.
SFDR predicted by eq. 3.29.





The minimum interval between samples occurs at the maximum value |x¨(ti)| and corresponds to
the frequency fSAW,LP,max, which is the dominant higher-frequency distortion spur and can be
outside the bandwidth. Eq. 3.30 can be used to determine this frequency for a sinusoidal input
with |x¨(t)|MAX = AIN(2pi fIN)2 as follows:





A second-order Taylor series expansion of the input can be used to estimate the rate of samples
in the time interval [tA, tB] by similar analysis as was used in the derivation of eq. 3.7. Eq. 3.30 can
be rewritten as ∆ti
√|x¨(ti)| =√2∆. If in the indicated time interval M samples are generated, the
77
sum of ∆ti















The approximation of the summation with the integral is valid for short intervals between consec-














For a sinusoidal input, the token rate can be determined, after some calculations, to be:





This token rate is a fraction of the token rate of a quantizer without linear prediction, R∆, derived








The following section provides numerical examples comparing the token rates of all schemes de-
78
tailed in this chapter.
Since consecutive samples can vary in amplitude by much more than ∆, ZOH reconstruction
cannot be used because it would cause large error peaks. In order to decode the ADCs output in
real-time applications, the ADC must transmit not only the estimate of x(t) but also an estimate
of the input slope at each sample, such that the reconstructed output can track y(t) exactly. The
significant reduction in the sampling rate of the linear-prediction ADC comes at the cost of requir-
ing a higher-resolution quantizer, a slope estimator, and a circuit to interpolate y(t) based on the
estimated slope. The decoder must also incorporate this slope-based interpolator. Higher-order
prediction and high-order reconstruction can be used at the cost of more circuit complexity in the
ADC and the reconstruction post-processor, respectively.
3.2 Comparison of sampling schemes using ZOH and higher-
order reconstruction
Continuous-time sampled signals contain a significant amount of information about the input in
the timing instants while using only a finite number bits to represent amplitude. In the case of
CT quantization, the input signal is known exactly during sampling instants with no uncertainty
in amplitude and time domains. The reconstruction of CT quantized signals has been studied
in [35] and [27]; a non-real-time technique proposed in [27] recovers the input with over 16-bit
precision while using only 4-bit quantization by using optimally-weighted sinc() functions. Such a
computationally-intensive technique, however, is not appropriate for real-time, low-power applica-
tions; ZOH is the appropriate form of reconstruction because it requires the least circuit complexity
79
Figure 3.9: Effective sampling rate and in-band SNDR of an 8-bit quantizer, an integral of error
criterion ADC, and a maximum 8-bit variable-resolution quantizer with 3 resolution settings and
a quantizer, all using ZOH reconstruction. The results of an ADC with linear prediction are also
shown but using FOH reconstruction.
and the least power. In systems where digital signal processing is performed before data transmis-
sion to the next block, a ZOH representation of the signal is used in the processing. Interpolative
reconstruction, even in the simplest FOH form, requires complex circuitry that demands power. In
some cases, when the information about the signal is transmitted to a base station, such power may
be available. In the cases where post-processing of the data is done off-line, the intervals between
sampling instants must be quantized with a fine clock to enable storage. Very long binary word
length may be required to accurately represent the signal while allowing for long intervals between
samples, as in cases of low signal activity; such high storage capacitive may be prohibitive.
The cases of ZOH and first-order reconstruction are considered in the comparison of the dif-
ferent asynchronous sampling schemes of Sec. 3.1. and Ch .2: CT quantization (the magnitude of
error sampling criterion), CT variable-resolution quantization, CT quantization with linear predic-
tion, the integral of error sampling criterion. The variations in the effective sampling rate and in-
80
band SNDR with input frequency are shown in Fig. 3.9. The thresholds of each sampler are chosen
such that the in-band SNDRs of each technique meet at a single point to allow a direct comparison
of the sampling rates. For the quantizer and variable-resolution ADC, minimum SNDRs are shown
in a small amplitude range near the amplitude corresponding to a full-scale input. As predicted by
eq. 2.7 and eq. 3.35, the token rate of a CT quantizer and that of an CT quantizer with linear pre-
diction both vary linearly with frequency. The ADC with linear prediction generates an order of
magnitude less samples than the 8-bit CT quantizer, while achieving an average SNDR close to
that of the CT quantizer. It is important to note that while the quantizer output is reconstructed
using ZOH, the linear-prediction ADC must use a more complicated FOH reconstruction scheme.
Thus, the improvement in the sampling rate comes at the expense of significantly increased circuit
complexity: a higher-resolution ADC (NS = 12 bits), a slope estimator, a predictor, and FOH re-
construction at the decoder/receiver. This technique can be effective for systems, such as sensors
and implantable biomedical devices, that dissipate the majority of the power in data transmission
to a receiver with processing capabilities [34].
A variable-resolution ADC detailed in chapter 2 offers an alternative approach to reducing the
number of samples generated, without a significant sacrifice in the performance. As the input
frequency is increased, the token rate of the variable-resolution ADC levels out, while that of
the quantizer continues to rise. For frequencies with several harmonics in-band, the difference in
the SNDRs is negligible. When only very few harmonics remain in band, very high SNDRs are
possible, as discussed in Sec. 1.3.1; the minimum SNDR of the VR quantizer saturates, without
dropping below the minimum SNDR of the fixed-resolution quantizer. The sampling rate savings
come at the cost of extra 2 bits to represent the step size and a slope threshold detector, which
add insignificantly to the hardware and power overhead. This approach is well suited for real-time
81
Figure 3.10: In-band SNDR of an 8-bit quantizer, an integral of error criterion ADC, and a maxi-
mum 8-bit variable-resolution quantizer and with 3 resolution settings and a quantizer with linear
prediction. The outputs of all ADCs are reconstructed with FOH.
low-power systems and processors. The sampling rate of the integral criterion ADC increases less
drastically with frequency than that of the quantizer, because it increases with the squared root of
the frequency, as indicated in eq. 3.8. The in-band SNDR of the integral criterion ADC increases as
the input frequency is increased because fewer harmonics are in-band; however, this increase with
frequency is slight because the number of times the accumulated error has reached the threshold
is larger, which offsets the reduction in the number of in-band harmonics. This technique can be
useful for system where signals of interest contain components at lower frequencies in the possible
band of frequencies most of the time and has found use in control systems. The integral criterion
ADC, however, requires a high-resolution quantizer (NS=12) as well as an integrator.
The in-band SNDRs of the four sampling schemes meet at an approximate point, as shown in
Fig. 3.9. The highest number of samples is generated by the integral criterion ADC while the least
samples are produced by the quantizer with linear prediction.
The in-band SNDRs of the sampling schemes are depicted in Fig. 3.10, with FOH reconstruc-
tion used for all four schemes. The SNDRs of the quantizer and variable-resolution ADC are prac-
tically identical because for fast changing input sections, where larger steps are used, the deviation
of the signal from its first-order approximation is negligible. Samples are essentially connected
with straight lines. For the case of the CT quantizers with fixed and variable resolution, the high-
82
frequency distortion attributed to the fast-switching sawtooth-like error is significantly reduced
compared to the case of ZOH (in Fig. 3.9b) because for fast portions of the input, linear recon-
struction is a very good approximation of the input. Due to the curvature of the signal, the signal
is above or below the reconstructed estimate for increasing and decreasing portion of the input, as
opposed to varying symmetrically about zero. As a result, the low-frequency distortion increases
and the SNDR, particularly at high frequencies when only a few low-frequency harmonics are in
band, decreases by up to a few dB. The SNDR of the integral criterion ADC in Fig. 3.10 is signifi-
cantly improved by using FOH reconstruction when compared to that shown in Fig. 3.9b because
low frequency distortion, caused by the error being not locally symmetric about zero in the case
of ZOH, is significantly reduced. With non-real-time high-order reconstruction techniques, very
high SNDRs are possible. Fixed and variable-resolution quantizers, in that case, offer higher signal
accuracy over the integral criterion and linear prediction ADCs because the amplitude (one of 2N
levels) and the level-crossing times are known exactly, whereas in the latter sampling schemes, the
sampling instants do not typically coincide with level-crossing times of the high-resolution NS-bit
quantizer.
A clock with a properly-selected fine time steps can be used to quantize the time intervals be-
tween samples while introducing a significant amount of aliased error into the baseband, as was
discussed in Sec. 1.4. This enables storage of asynchronously-sampled data and may ease the
implementation of certain system blocks, for example a linear predictor and higher-order recon-
struction. In systems where the power consumption of the required high-frequency clock is only
a fraction of the power dissipated in transmitting or processing the information, time quantization
is a viable alternative to fully CT operations. Time-quantized asynchronous ADCs can also be
followed by more complex post processing, including downsampling, to achieve high resolution
83
while using only a few bits for the A-to-D conversion [7]. In CT sampled signals, the majority
of the error power is contained in a frequency range up to the sawtooth frequency fSAW , with the
peak error spur occurring at fSAW . Since the inverse of fSAW correspond to the minimum timing
between consecutive samples, the time-quantization clock frequency must be higher that fSAW . If




Digital Signal Processing of Wideband
Gigahertz Signals
The advantages of activity-dependent power dissipation, a robust response to changes in the input
signal, and an alias-free spectrum have made signal-driven sampling and processing a promising
alternative to purely analog and classical digital systems for applications such as sensors, control
and speech processing [15, 16, 29], which have thus far been confined to at most the megahertz
processing range. These benefits can also be extended to applications in the lower gigahertz range,
such as receivers. This work is concerned with systems with signal frequencies of up to a few
gigahertz. In multistandard gigahertz-range applications it is sensible for the response of a signal
processor to accommodate changes in the input, such as input power level and bandwidth, as well
as to changes in the environment, such as the communication-channel characteristics and blocker
signals’ frequency locations and power levels. The power dissipation of the system can likewise
be adapted according to demand. This is illustrated in Fig. 4.1. The amplitude resolution of the
processor can also be actively adapted to satisfy various performance requirements. In the case
85
Figure 4.1: Illustration of the adaptability of a processor’s response in different signal and en-
vironment scenarios in the context of a receiver: (a) strong signal, (b) signal and single strong
blocker, (c) signal corrupted by multiple undesirable components.
86
that a strong desired signal is received, which is uncorrupted by undesirable components and is
not distorted by the channel, there is no need for filtering or channel equalization, as is shown in
Fig. 4.1a. The lowest-order processor, which merely passes the signal, can suffice in achieving the
desired performance, such as a minimum SNDR or a bit error rate (BER). In systems where the
metric of interest is being able to recognize whether a signal is present or not, such as in pulse radio
receivers, low-amplitude resolution of even a single bit can be enough to adequately represent a
strong received signal. In this scenario shown in (a), the processor’s order and resolution can be
actively minimized, which leads to a significant reduction in power dissipation. In the second
scenario (b), where a single strong blocker is present, a low-order processor configuration can be
used to attenuate the blocker by placing a frequency response notch in its vicinity. In the worst-case
scenario (c), a weak signal of interest is surrounded by strong undesirable components, which can
occurs when system blocks that precede the processor have limited selectivity, particularly when
the signal of interest is not narrow-band and requires broadband front-end circuits. Blocker signals
that are close in frequency to the signal band of interest will then be insufficiently attenuated;
therefore, a high-order processor configuration will be necessary to recover the signal from among
the blockers.
Such wide programmability, which is needed to adapt to a variety of signal conditions, is not
feasible with classical analog solutions but can be achieved with digital signal processors. Clas-
sical clocked DSPs, however, are difficult to implement at high clock speeds. Generating multi-
gigahertz-frequency clock signals and distributing them throughout the system with adequate delay
and signal integrity is a difficult and very power-hungry task, requiring tens of mW even at lower
gigahertz frequencies [36, 37]. Due to the synchronous operations, conventional digital systems
introduce a lot of noise into the substrate and generate undesirable EMI emissions, which have to
87
meet strict regulations. These difficulties, along with a high power consumption, make conven-
tional DSP not well suited for gigahertz processing. Digital processing of GHz signals in contin-
uous time is an attractive alternative because it retains the benefit of wide programmability, but
eliminates the power-hungry clock from the system and automatically adapts the rate of operations
– and, consequently, power dissipation – to the activity of the input signal, as opposed to allowing
them to be set by the timing of a clock. Without a clock, EMI emissions and substrate noise are
reduced because digital operations occur asynchronously, therefore the supply current peaks are
decreased and spread in time, leading to a “cleaner” supply voltage.
The context of the work described in this chapter is a low-resolution CT digital signal processor
for low-dynamic-range applications in the lower GHz frequency range (i.e. below 5 GHz ).
4.1 Problems with current processing approaches and alterna-
tive solutions
A CT digital system is based on clockless level-crossing sampling. For gigahertz signals, how-
ever, this can result in a very short time intervals between consecutive quantization level crossings,
which must be accurately maintained in the transitions of the digital signal. For a full-scale 3-GHz
signal quantized with three-bit resolution, for example, this granular time, denoted by TGRAN,LCS,
is 13 ps, which is less than the propagation delay of an inverter in modern CMOS technologies.
Extremely narrow pulses, with pulse width or intervals between pulses equal to this granular time,
would be needed to represent the input digitally, such as using binary or ∆-encoded signal repre-
sentation as shown in Fig. 4.2. Processing or even generating such digital signals is not feasible.
88
Figure 4.2: Level-crossing sampling of a 3-GHz signal, with binary and ∆-encoded signal repre-
sentation.
The processing approach of CT DSPs in prior art [14, 15], therefore, cannot be extended to the gi-
gahertz signal range. A partial solution to the granularity problem is to use per-level thermometer
encoding to represent the signals, such as that found at the output of a clockless flash ADC. An
N-bit CT quantized signal is represented with 2N and 2N − 1 per-level signals for mid-tread and
mid-rise quantizers, respectively. This is illustrated in Fig. 4.3 . The input is assumed to be nor-
malized to an amplitude range of ±1, the quantization levels are separated by a quantization step
∆= 22N , and the per-level signals can have values of ±1. A positive pulse of each per-level signal
LEVm indicates that the input signal x(t) is above the mth quantization level, T Hm, starting at the
time corresponding to the positive edge of the pulse until the time corresponding to the negative
edge of the pulse.
LEVm = sign{x−T Hm} (4.1)
89
Figure 4.3: Per-level representation of a quantized signal.
90
The output of the sign() operator is equal to 1 for a positive argument, −1 for a negative argument
and 0 for a zero-valued argument. For a sinusoidal input, the per-level signals are periodic with the
frequency of the input. The quantized signal, xq(t), can be constructed merely by adding all the






Each per-level signal can be processed separately, and the filtered output can be constructed by
adding the signals at every level and filter tap. Consecutive level-crossing times are preserved in the
relative timing of the transitions of the parallel per-level signals, without having to represent them
with narrow pulses, as is the case of binary and ∆-encoded signal representation. The minimum
granular time that the CT DSP must tolerate, TGRAN,LEV , is then extended from the minimum time
between consecutive level crossings to the minimum time between crossings of the same level. In
theory, this time can become infinitesimally small when the peak of the signal barely surpasses a
quantization threshold, but in reality this time is limited by the comparator speed in the ADC. The
granularity time of a per-level-encoded signal will be defined for an input of the maximum input
frequency and an amplitude such that the peak of the signal surpasses the maximum threshold by
1
2∆. The time, TGRAN,LEV , corresponds to the time the peak of the input is above the outermost











Figure 4.4: Propagation of a 55 ps pulse through a chain of FO2 inverters. The output signals of
even-numbered inverters are shown under the corresponding number.
For three-bit resolution and a maximum input frequency of 3 GHz, the granularity time is extended
from 13 ps to 57 ps. A CT flash ADC can be designed to produce per-level output with minimum
pulse width of TGRAN,LEV . Propagating such narrow pulses through a digital processor, however,
is problematic because narrow pulses can be distorted and can even disappear as they propagate
through a chain of digital gates, as shown in Fig. 4.4 for a chain of inverters with a fanout-of-two
(FO2) load each. If a falling edge of the input arrives before the output of an inverter has fully
settled, the inverter’s propagation delay on the falling-edge input is less than the nominal delay
because the output voltage has to increase by a smaller amount, given that it starts from a slightly
higher potential than that of ground. Due to this memory effect, the pulse width of the output
signal is shortened. The output of the inverter that follows will be even more distorted because
the interval between the two input edges will be even shorter. As the pulse propagates through the
chain, it will begin to resemble a triangle rather than a rectangular pulse, and eventually, the pulse
width will be short enough and the input slew rate slow enough that the output of the following
92
inverter does not reach VDD/2, and fails to switch. The pulse will then disappear in a chain of
inverters. This pulse width distortion and pulse disappearance is made worse if the propagation
delays for a rising- and falling-edge inputs differ, which is inevitable since the drive strength of
the pull-up and pull-down network cannot be expected to match exactly, because pulses of one
polarity will be broadened, and pulses of the opposite polarity will be made more narrow and can
disappear.
Differential logic circuits ideally provide the same delay for either input polarity due to the
inherent circuit symmetry. However, device mismatch can upset the symmetry and can, thereby,
cause an input-polarity-dependent delay discrepancy, which leads to pulse width distortion and
possible pulse disappearance. Further, since the settling behavior of the outputs is similar to that
of an RC circuit, differential gates still suffer from the same memory effect that affects the single-
ended gates, where the slow RC-like settling of the output voltages can cause the propagation
delay to be shorter for input edges that are narrowly spaced, and longer for input edges that are
spaced wider. This becomes a problem for narrow pulses because the first pulse edge will be
propagated with the nominal delay and the second pulse can arrive before the output is fully settled,
which will cause the propagation delay on the second pulse to be shorter. As a result, the output
pulse will be narrower. Due to propagation-delay discrepancies of differential and single-ended
gates, an input pulse will either broaden or become more narrow as is propagates through the
chain, depending on the polarity of the delay discrepancy. Both pulse-width distortion effects
are detrimental because they distort the timing information of the quantized signal and reduce the
signal’s integrity. Increasing temperature, variation in the process, random local device mismatch,
and wire capacitances heighten the problem of pulse width distortion and disappearance. Thus,
it cannot be guaranteed that per-level pulses, even at lower quantization thresholds, will make it
93
through several taps of the CT digital processor without significant timing distortion and eventual
disappearance. An alternative processing approach is needed for reliable processing of gigahertz
digital signals.
4.2 Per-edge signal encoding for gigahertz signal processing
In order to avoid processing potentially narrow pulses of a per-level signal, alternating edges of the
signal can be processed separately. Each per-level signal LEVm can be represented as a rising-edge
signal Rm, which toggles on each rising edge of LEVm, and a falling-edge signal Fm, which toggles
on each falling edge of LEVm, as is illustrated in Fig. 4.5a. The level-crossing times are accurately
preserved in the transitions of the two per-edge signals, with each toggle of Rm and Fm indicating
that the input has crossed the mth threshold in an increasing direction and a decreasing direction,
respectively. This way, the pertinent timing information about the signal is retained in the relative
timing between the parallel per-edge signals instead of having to be represented with extremely
narrow pulses. A related technique is described in [38] for an asynchronous Σ−∆modulator-based
analog FIR filter. The per-edge signals can be generated using T-type flip-flops, which toggle on a
rising or falling edge of the clock. A more efficient implementation of the per-edge encoder, shown
in Fig. 4.5b, uses a single T-type flip-flop composed of two latches and a feedback inverter invF.
The second per-edge output is generated at the intermediate node between the two latches. The
three other inverters in the encoder buffer the output, provide a means of delay equalization, and
cause the per-edge outputs to have the same initial polarity. The latter result is unnecessary. For
per-edge signals of the same initial polarity, the result of an XOR operation of the two per-edge
signals is the per-level signal. (For initial values of opposite polarity, the XNOR operator recovers
94
Figure 4.5: (a) Example signal of per-edge representation and, (b) per-edge encoder.
the per-level signal.) The per-level signal is reconstructed according to the following expression:
ˆLEV m = Rm⊕Fm (4.4)
The per-level input and the ideally reconstructed per-level signal are exactly equal. As a re-
sult, while the XOR is not a linear operator, the input-to-output response of the per-edge encoder
followed by per-edge decoder is linear, if ideal operators are used. However, since real implemen-
tations of logical operators can introduce non-idealities such as propagation-delay mismatch, the
encoding/decoding process will introduce some timing error to LEVm. This effect is analyzed in
Sec. 4.2.1.
A gigahertz-range CT DSP system using per-edge encoding is shown in Fig. 4.6. The system is
composed of a clockless flash ADC, which generates the thermometer-like per-level signals, which
95
are then converted to per-edge representation using per-edge encoders. The per-edge encoding
allows gigahertz digital signals to be processed separately using parallel per-edge processors for
each per-edge signal. The per-edge processors are FIR filters composed of asynchronous delay
blocks to realize the tap delays and tap coefficients.
Use of inverting delay blocks: The delay-block topology, shown in the bottom of Fig. 4.6, is
specifically picked to be inverting so that the systematic propagation delay mismatch created by
one delay block is corrected by the same mismatch in the following delay block. Consider delay
cells with a longer delay on the falling input edge. For a positive input pulse, the output of the first
delay cell is a lengthened inverted pulse because the second output edge is delayed. At the output
of the second delay tap, the first output edge is delayed while the second edge is on time, therefore
the original pulse width is restored.
A CT DAC then performs signal reconstruction and converts the signal to the analog domain.
The processor and DAC implementations are detailed in Chapter 5. For a periodic input signal,
each quantization level is traversed in the same direction only once per input cycle. As a result,
the minimum pulse width is extended from a small fraction of the input period, in the case of a
per-level representation, to the period of a maximum-frequency input, TGRAN,EDGE = 1/ fmax, for
the case of per-edge representation.
Timing deterioration through propagation: As the number of asynchronous digital gates in
a processing chain is increased, the variation in the output pulse width increases; eventually, at
some point along the delay chain (e.g. after a few nanoseconds), the accumulated timing error will
exceed an acceptable limit and the distorted output signal will fail to meet the minimum SNDR
specification. This problem plagues CT and asynchronous processors, while classical DT DSPs
are immune to pulse-width distortion on account of the clock. For the CT digital processor, the
96
Figure 4.6: Block diagram of a gigahertz CT per-edge-encoded DSP system.
97
accumulated error places an upper bound on the delay chain length and, therefore, the maximum
filter order. For a maximum input frequency of 5 GHz, the minimum pulse width is 200 ps. If the
CT processor in this example is of low- to moderate-filter order, which corresponds to tap chain
delay of a few nanoseconds, this minimum pulse can be propagated through the tap delay chain
and will retain its integrity.
4.2.1 Non-idealities of per-edge encoding: The generation of half-harmonics
Due to the non-idealities of the technology and the implementation, per-edge encoding of CT
quantized periodic signals introduces distortion components at the input’s half-frequency and its
harmonics.
Consider a sinusoidal input, which traverses a quantization level m in the same direction only
once an input cycle. The resulting per-edge signals each toggle once per input cycle, and thus have
the periodicity of half the input frequency, as shown in Fig. 4.7. Any mismatch in the propaga-
tion delays of processing blocks for rising and falling input edges results in an error being added
to the reconstructed signal at every other cycle; this causes half-harmonics. The sources of these
technology- and implementation-dependent delay mismatches along with design techniques to re-
duce and correct their effects are discussed in Ch. 5, which details the implementation of the CT
per-edge processor..
Consider a digital-processing block whose propagation delay for a rising-edge input is longer
than that for a falling-edge input by a time τ. The mismatch in delay translates to an error in the
output signal’s duty cycle because the positive pulses are shortened by τ while negative pulses are
lengthened by the same duration. For a sinusoidal input, the per-edge signals at every level have a
98
Figure 4.7: Example per-level and per-edge signals for a sinusoidal input.
Figure 4.8: Effect of delay mismatch on per-edge signals and an ideally-reconstructed per-level
signal for τ = 0.15TIN. The initial value of per-edge signals is of the same polarity for (a), and of
the opposite polarity for (b). The delayed input is shown to align the level-crossing times with the
transitions of the ideal signals (dashed).
50% duty cycle. The input-polarity-dependent propagation delay mismatch causes the duty cycle
to be reduced to (0.5− τ fIN)× 100%, where fIN is the input frequency, as illustrated in Fig. 4.8.
Assuming this mismatch is a systematic error that appears in all of the blocks at each level, and
neglecting the effects of random mismatch, the duty cycles of all per-edge signals will be distorted
by the same amount.
The effects of delay mismatch of a digital processing block on a single reconstructed per-level
signal ˆLEV m are studied first. Two cases are considered in Fig. 4.8: (1) both the rising-edge signal
99
Rm and the falling-edge signal Fm start with the same polarity (Fig. 4.8a), and (2) Rm and Fm are
initially of opposite polarities (Fig. 4.8b). The input is assumed to start below the threshold value
corresponding to level m. The per-edge signals and the reconstructed per-level signal, ˆLEV m, are
shown at the output of a processing block (non-inverting in this example) that has an arbitrary
average delay TD and a mismatch of τ between the delay on a rising- and a falling-edge input.
The delay of a digital processing block on a rising-edge input is TD− τ/2 and the delay on the
falling-edge input if TD+τ/2, such that the average delay is TD. An exaggerated case with a delay
mismatch τ is 15% of the input period TIN is shown. In order to visually align the level-crossing
times of the input signal with edges of the ideal signals, the delayed version of the input, x(t−TD),
is shown rather than the actual input signal, x(t).
Consider the per-edge signals in the first case shown in Fig. 4.8a: the rising edges of the per-
edge signals propagate with a shorter delay, and therefore appear τ/2 before the ideally-processed
edges. The falling-edges, on the other hand, propagate with a longer delay and appear delayed by
τ/2 relative to the ideally-processed edges. The pulse-widths of the positive reconstructed per-level
pulses are accurately preserved. The pulse centers are misaligned from the ideal pulse centers. The
reconstructed pulses alternate between being ahead of or delayed from the ideal pulses by τ/2.
In the second case shown in Fig. 4.8b, the per-edge signals start with values of the opposite
polarity. The first edge of Rm is ahead of its ideal edge by τ/2 whereas the first edge of Fm lags
behind its ideal edge by τ/2. The opposite behavior is observed in the second transitions of the
per-edge signals. The first reconstructed per-level pulse is then broadened by τ and the second per-
level pulse is made more narrow by τ. The per-level pulse widths are distorted, alternating between
being widened and made more narrow on every other pulse, but the pulse centers are preserved.
Since the input-polarity-dependent delay mismatch is present at all levels, the distorted per-
100
Figure 4.9: Reconstructed signal (a–b), quantization error (c–d), and the output spectra (e–f) of a
per-edge encoded quantized sinusoid with τ = 0.1TIN (distorted) and with τ = 0 (ideal) for initial
values of the per-edge signals (a, c, e) of the same polarity and (b, d, f) of opposite polarities.
101
level signals will cause distortion in the reconstructed signal. Fig. 4.9 illustrates the resulting
distortion of a sinusoidal input for the exaggerated case where τ = 0.1TIN (TIN is the period of
the input). For the case of the per-edge signals’ initial values being equal in (a), the distorted
reconstructed input, xˆq, appears to be composed of two single-cycle segments (each of the duration
TIN), which are unequally spaced relative to ideally quantized signal, and repeat every other input
cycle. The first single-cycle segment of xˆq is shifted ahead of the first ideal input cycle of xq by τ
and the second segment of xˆq lags behind the second ideal segment of xq by τ. For the other case of
the per-edge signals being of opposite polarities initially, as shown in Fig. 4.9b, the centers of the
single-cycle segments of xˆq perfectly align with centers of the xq, but the shapes of the single-cycle
segments of xˆq are distorted. The first distorted single-cycle segment of xˆq is narrower at its center,
whereas the second segment is broader at its center. The two segments repeat every other input
cycle.
The resulting quantization error, eˆq for both cases is shown in Fig. 4.9c–d, and is compared to
the ideal case of τ= 0 (eq). For the ideal case, the quantization error, eq, consists of fast-switching
sawtooth-like segments that vary symmetrically about zero. For the distorted outputs, however,
fast-switching sawtooth-like segments vary about a slowly varying non-zero mean. This local
mean is periodic with half the input frequency and, as a result, the error contains power at 12 fIN
and its harmonics, as is shown in Fig. 4.9e–f.
It should be noted that the illustration shows an exaggerated case of delay mismatch, so the
half-harmonics, which are only −15 dB below the main component at fIN, dominate. The CT
processor chip in this work (Ch. 5) is designed such that the half-harmonics are maintained below
the level of the harmonics caused by ideal quantization, shown in green in Fig. 4.9e–f.
The focus of this section is on the error components at the input’s half frequency and its low-
102
frequency harmonics. Since the power of these harmonics is essentially the same, regardless of
the initial polarity of the per-edge signals, the case of equal polarity is considered. Instead of
representing the reconstructed output as having cycle-long segments with an alternating delay ±τ,
it can be equivalently represented as one cycle coinciding exactly with the input and the second
cycle delayed by 2τ, as is shown in Fig. 4.10a. Simulations confirm that either representation yields
the same half-harmonic distortion. The latter representation will be considered for the remainder
of the discussion, but the results are valid for both representations.
The powers of the half-harmonic and other distortion components can be exactly determined by
expressing the reconstructed distorted output with a Fourier series with the lowest frequency com-
ponents at the input’s half-frequency. This involved calculation can be avoided by using simulation
tools; however, a simplified closed-form expression for the half-harmonic power is desirable. The
following analysis approximates the effects of delay mismatch, derives expressions for the half-
harmonic power, and validates the approximation by comparing it to simulation results.
The quantization error of the output, eˆq = x− xˆq, can be represented as the sum of the ideal
quantization error, eq = x− xq, which accounts for the fast-switching sawtooth-like segments and
the bell-shaped error, and the error due to the delay discrepancy, which accounts for the slowly
varying local mean. Since low-frequency distortion is of interest, the sawtooth-like error can be
removed from the reconstructed output by processing it through a low pass filter with a cut-off
frequency just beyond the harmonic of interest. This is equivalent to smoothing out the steps in
the black waveform in Fig. 4.10a. What remains can be approximated by a distorted sinusoid xD
composed of a repeating two-cycle segment as shown in (a). One segment is a sinusoidal cycle that
aligns exactly with the input, xD,1(t) = x(t). The other segment is a sinusoidal cycle offset from
the input sinusoid by 2τ, and is of the form xD,2(t) = x(t−2τ).
103
Figure 4.10: (a) Input sinusoid x, reconstructed output xˆq, and distorted-sine signal xD, (b) quanti-
zation error eˆq, (c) low-pass filtered quantization error, eˆq,lowpass and the error, eD, of the distorted
sine, xD.
The error, eD, between the input signal and this distorted sinusoid, xD, is a good approximation
of the low-pass filtered quantization error, as is illustrated in Fig. 4.10c. Consider an input x(t) =
AINcos(2pi fINt). The error ,eD, of the distorted sinusoid (xD) approximates low-pass filtered error,
eˆq,lowpass, and can be expressed as a Fourier series, with the lowest frequency component at half
the input’s frequency:






















For a nonperiodic band-limited input signal (bandwidth of fMIN to fMAX), the Fourier series can
be replaced with a Fourier transform, and the half-harmonic distortion changes from a single tone
104








case of sinusoidal input is studied, but the results can be extended to more complicated inputs. The
Fourier component at the half-harmonic is given by the following expression, where the integral is
calculated in the interval corresponding to the second cycle, xD,2 = x(t−2τ), because the error in
































































For a delay discrepancy τ that is much smaller than the input period, small-angle approximations
of sin(ε) ≈ ε and cos(ε) ≈ 1− (ε/2)2 (for better accuracy, the second terms of the Taylor series



















Figure 4.11: Half-harmonic distortion power versus percent delay discrepancy using simulation






 τTIN , the amplitude of the half-harmonic can be approximated with just the sine
term, b 1
2 fIN
. The half-harmonic distortion is independent of the quantizer resolution and is only
a function of the percent delay discrepancy between rising- and falling-edge input propagation
delays. Fig. 4.11 illustrates how the power of the half harmonic changes with the percent delay
discrepancy using simulation results and the results of using eq. 4.7. The calculated result matches
the simulated result, which confirms the validity of approximating the low-frequency error with a
distorted sinusoid error eD.
In addition to producing half harmonics, the input-polarity delay discrepancy causes a decrease
in power of the component at the input frequency. The amplitude error of the fundamental com-
ponent can be determined by calculating the Fourier components at the fundamental frequency,
106






















For the case of an ideal quantizer (τ= 0), the component at fIN is in phase with the input and has
an amplitude that is approximately equal to the amplitude of the input, AIN, if there are sufficiently
many quantization steps to represent the signal. Since xD = x− eD, the power of the reconstructed




















Again, the small-angle approximations are used, again including the second-order term of the










The half-harmonic distortion component is independent of the quantizer resolution and affected
only by the delay discrepancy. However, at the fundamental frequency, this delay discrepancy is
not the only source of amplitude error. For an ideal quantizer, the output power of the fundamen-
tal is not equal to the input power because the bell-shaped error segments contain power at the
fundamental, which accounts for the difference. For small delay discrepancies, b fIN and a fIN are
107
Figure 4.12: Output power at the input frequency versus percent delay discrepancy using simula-
tion (black, dashed), eq. 4.10 (blue), and eq. 4.11 (red).
negligible compared to the ideal quantization error at fIN. The variation in the output power at
the input frequency versus the percent delay discrepancy is shown in Fig. 4.12. Calculated results
are obtained by using eq. 4.10 and eq. 4.11 and compared with simulation results. For delay-
discrepancy values less than 10% of TIN, the calculated and simulated results match. As the ratio
τ
TIN
is increased, the accuracy of the small-angle approximations begins to decline, and the results
of eq. 4.11 diverges from the simulated results. It is assumed that a CT processor is designed
to have a delay discrepancy such that the half-harmonic is at least 20 dB below the fundamental
component, the approximated expression of the half-harmonic power can be used.









/x2D can then be expressed using eq. 4.7
108
Figure 4.13: Half-harmonic distortion ratio HD 1
2
versus percent delay discrepancy obtained using
simulation (dashed) and calculation (solid).
















The half-harmonic distortion ratio versus percent delay discrepancy is shown in Fig. 4.13,
using simulation and calculation results, which match. The distortion decreases by 20 dB if the
delay discrepancy is decreased by an order of magnitude. For a fixed delay discrepancy τ, higher
frequency inputs suffer worse half-harmonic distortion because the delay error makes up a larger
percentage of the input period. In CT digital systems, increasing the input frequency is equivalent
to increasing the resolution because signal information is retained in the timing signals. As the
input frequency is increased, higher timing accuracy is required, which is equivalent to lower
acceptable delay error.
109
Half-harmonic distortion of an FIR filter
The half-harmonic distortion analyzed in the previous section was for the case of an ideally quan-
tized signal passed through a delay block with a propagation delay difference between rising- and
falling-edge input. Reconstruction is also assumed to be ideal. That scenario is equivalent to con-
sidering the reconstructed signal at a single processor tap of the filter in Fig. 4.6. The half-harmonic
behavior of a multi-tap FIR filter is studied in this section.
It is assumed that the tap delay blocks, shown at the bottom of Fig. 4.6, are inverting and that
all tap delay blocks have the same delay and the same delay discrepancy at every level and every
tap. The effects of delay variation between the tap delay blocks due to random local mismatch and
noise are studied in the two following sections and are neglected in the analysis of half-harmonics
because only global variations are considered here. It is assumed that the per-edge encoder does
not introduce any timing error and that the reconstructed signal at the 0th filter tap is then an ideally
quantized signal. The average delay of the tap delay block is set to equal the tap delay of the desired
frequency response, TD. The input-polarity-dependent delay discrepancy, τ, is caused by the tap
delay blocks, and is corrected at every other tap due to the inverting topology of the delay blocks.
At even-numbered taps including tap 0, the reconstructed signal is an ideally quantized inputs
xq,m(t) = xq(t−mTD) for m = 0,2,4... (4.14)
and at odd-numbered taps, the reconstructed signal is the delay-distorted signal,
xq,m(t) = xˆq(t−mTD) for m = 1,3,5... (4.15)
110
with half-harmonic distortion that has an amplitude given by eq. 4.7.
The filter’s frequency response, H(j f ) = c0 + c1e(− j2pi f )TD...+ cke(− j2pi f )kTD , is normalized to
have a passband gain of 0 dB and a sinusoidal input is considered. The simple two-tap filter is
studied: a single delay cell and two coefficients of equal normalized magnitude |c0| = |c1| = 12 .
The response of a notch filter is considered, in which case the coefficients have the same sign. The
delay cell will introduce a timing error to the signal at the 1st tap, while the signal at the 0th tap is
undistorted. The reconstructed output of the filter will contain a component at the input frequency,
eq. 4.11, scaled by the frequency response,
y2fIN =





quantization-error harmonics that are not of interest, plus a half-harmonic component, the ampli-
tude of which is given by eq. 4.7 and is scaled by the first-tap coefficient value, since the distorted










Fig. 4.14 shows the powers of the fundamental and half-harmonic components of a sinusoidal
input processed through a two-tap notch filter with a tap delay TD = 133 ps, which corresponds to
a notch at 3.75 GHz, using simulated results and calculated results from eq. 4.9 and eq. 4.10. The
frequency response of the filter is essentially unaffected by delay discrepancy because the error at
the fundamental is negligible for a reasonable range of τ. The half-harmonics are not affected by
the notch frequency response.
111
Figure 4.14: Output power of a notch filter (a) at the fundamental frequency and (b) at the half-
harmonic component.
For the general case of an FIR filter response H(j f ) of order K and coefficients c0 through
cK , the output power at the input frequency is given by eq. 4.16. Since the half-harmonics occur
only at odd taps due to the inverting tap delay block topology, they are processed with a different
filter response, H1
2
(j f ). The half-harmonic filter has coefficients d0 through dbK/2c corresponding
to the odd-tap coefficients of H(j f ). The half-harmonic power at the output of the FIR filter, using












The half-harmonics are, therefore, filtered by the half-harmonic filter H1
2
(j fIN/2). Fig. 4.15 shows
the output powers at the fundamental and half-harmonic frequencies of a sinusoidal input pro-
112
cessed through lowpass, bandpass and bandstop filters for a 10 ps delay discrepancy using eq. 4.16
and eq. 4.18 and using simulation. The calculated results of the half-harmonic power match the
simulated results for the three filter configurations. The calculated output power of the fundamen-
tal component matches the simulated results in the passband. In the stop-bands, particularly where
there is significant signal attenuation, some divergence between the simulated and expected results
is observed because of the approximations in the calculated expression. The delay discrepancy
causes a small change in the phase and amplitude of the signal at the corresponding tap, which is
typically negligible. At notch frequencies of certain filter responses, however, the phase discrep-
ancy prevents the formation of the notch and instead causes the notch frequency to shift, as seen in
the lowpass filter response (red) in the figure. Filter configurations whose transfer-function zeros
are sensitive to coefficient mismatch are particularly sensitive to this phase discrepancy.
The phase error introduced by the input-polarity-dependent delay discrepancy causes the gen-
eration of half-harmonics and can cause shifts in the stop-band notch frequencies of sensitive filter
configurations. The delay discrepancy can limit the signal integrity and system performance and
must be maintained below the maximum allowable τ to guarantee a minimum SNDR and the desir-
able level of frequency attenuation, respectively. When designing for the worst-case performance,
which occurs when the half-harmonics are not attenuated relative to the fundamental component,
as in the case of a single-tap configuration, Fig. 4.13 can be used to determine the maximum al-
lowable delay discrepancy for the maximum input frequency. For example, for a goal SFDR of 20
dB for a single-tap FIR configuration and a maximum input frequency of 4 GHz, the delay dis-
crepancy must be maintained under 10 ps. For frequency responses with more taps activated than
one, however, this delay discrepancy constraint is relaxed because the half-harmonics are present
at every other tap due to the inverting tap delay topology discussed in Sec. 4.2, while the signal at
113
Figure 4.15: Output power of lowpass, bandpass and bandstop filters (a) at the fundamental fre-
quency and (b) at the half-harmonic frequency.
114
the fundamental frequency occurs at every tap. In addition, the half-harmonics can be attenuated
by the half-harmonic filter, H1
2
(j fIN/2), thus allowing for an even more relaxed delay discrepancy.
For a gigahertz digital processor, the maximum speed and resolution are limited, to the first order,
by the speed of the ADC, but they are also constrained by the input-polarity-dependent delay error
of the ADC, digital processor and DAC.
4.2.2 Effects of delay mismatch on the distortion
Mismatch in the delays between taps of an FIR filter causes the frequency response to be distorted
relative to an ideal frequency response. In a per-edge-encoded processor, however, there is mis-
match in delay not only between taps but also between each per-edge signal at each tap. Since the
mismatch is a static effect, it causes harmonic distortion of the signal reconstructed at each tap.
Random tap delay mismatch of a per-edge processor causes not only the frequency response to be
distorted but, more importantly, an increase in harmonic distortion of the reconstructed signal. The
effects of delay mismatch of all tap delay blocks on the response of a CT digital processor and the
distortion of a reconstructed signal are discussed in this section with the help of Matlab® simu-
lations. This discussion excludes the effect of global delay mismatch for rising- and falling-edge
input signals, which was discussed in the previous section. The effect of jitter is studied separately
in the following section. Since lower-frequency error harmonics of a quantized signal are strongly
affected by small variations in the input amplitude, as was discussed in Sec. 1.3, a small amount
of noise is added to the input signal simulations in order to obtain an average harmonic power.
As explained, delay mismatch is a static effect, unlike delay jitter, thus it causes an increase in
distortion rather than in the noise floor. In a per-edge-encoded CT DSP, signal distortion has the
115
periodicity of half the input frequency because the delay variations for rising-edge and falling-edge
signal are, in the worst case, uncorrelated.
The effect of delay mismatch on of a filter with several configurations is studied in Fig. 4.16-
4.18 for the standard deviation of delay mismatch σTD of 5 ps, 10 ps, and an excessive value of
20 ps. The delay mismatch is assumed to have a zero-mean normal distribution. All powers are
shown relative to a maximum output power of 0 dB, which occurs in the passband of a filter. The
results shown are for an example three-bit system that has a goal SFDR of 20 dB. The powers at
the input, the half harmonic, second harmonic and third harmonic frequencies are shown for each
input frequency, averaged from 50 simulations. The output power of the fundamental component
for the ideal case of no delay mismatch is shown as a dashed curve (the ideal output power at fIN
is equivalent to the ideal frequency response of the filter).
The most basic filter configuration uses a single tap just to reconstruct the input signal without
doing any filtering. To illustrate the effects of mismatch, the signal is reconstructed not at the 0th
tap, but at the 1st tap in order to introduce the timing distortion of one delay tap. Simulated results
are shown in Fig. 4.16, which illustrates the worsening performance of the filter with increasing
frequency due to delay mismatch. The reason for this frequency-dependent performance decay is
that as the input frequency is increased, the static delay mismatch makes up a larger fraction of the
input period and, as a result, causes a stronger deterioration in the signal integrity. For σTD = 5ps,
shown in (a), the fundamental frequency response in black shows a maximum attenuation of 0.7
dB at the maximum input frequency of 6 GHz. As the input frequency is increased, the half-
harmonic and second-order distortion components increase in power; however the distortion power
is dominated by the third harmonic caused by three-bit quantization. The effects of delay mismatch
are therefore fairly negligible for such a small σTD and the SFDR is over 23 dB. If the mismatch
116
Figure 4.16: Power of the fundamental component and several distortion components at the output
of a three-bit, single-tap filter with the standard deviation of delay mismatch of (a) 5 ps (b) 10 ps,
and (c) 20 ps for a range of input frequencies.
is increased to 10 ps, as shown in (b), the attenuation of the fundamental frequency component
increases to 1.7 dB for frequencies up to 6 GHz and the second and half-harmonics begin to exceed
the power of the third harmonic at higher input frequencies. The minimum SFDR for this case,
which occurs at the maximum input frequency, is over 19 dB. For the extreme case of σTD = 20
ps, the power of the fundamental is seen to be significantly attenuated at higher frequencies by
as much as 4.5 dB. The distortion components are seen to have a notable increase in power with
increasing input frequency and begin to be dominated by power of the half-harmonic, which limits
the minimum SFDR to only 10 dB. From these results, it can be concluded that for a 20 dB SFDR
for frequencies up to 6 GHz, the standard deviation of the delay mismatch must be in the range of
5 ps to 10 ps.
The effect of delay mismatch on the response of a notch filter is shown in Fig. 4.17 for a
tap delay of 133 ps, which corresponds to a notch frequency at 3.75 GHz. When comparing the
frequency responses in (a) for σTD = 5ps, (b) for σTD = 10ps, and (c) for σTD = 20ps, the delay
mismatch is seen to limit the maximum attenuation at the notch frequency to 30 dB, 24 dB and
19 dB, respectively. Similarly to the case of a single-tap filter configuration, the third harmonic
dominates the distortion in (a). The third harmonic of a 1.2 GHz input signal is significantly
117
Figure 4.17: Power of the fundamental component and several distortion components at the output
of a three-bit, two-tap notch filter with the standard deviation of delay mismatch of (a) 5 ps (b) 10
ps, and (c) 20 ps for a range of input frequencies.
attenuated because it coincides with the notch of the frequency response. This notch in the power
of the third harmonic can be observed in all three part of the figure. For the maximum delay
mismatch of 20 ps, the half-harmonic once again dominates at higher frequencies.
Fig. 4.18 illustrates the response of the CT digital processor configured as a seven-tap bandpass
filter of a tap delay of 133 ps and a center frequency at 3.75 GHz. For the case of σTD = 5ps in (a),
the frequency response is essentially undisturbed by the delay mismatch. The peak distortion spur
power is under -28 dB; therefore, the minimum SFDR for a single input tone in the passband is
over 27 dB, which is primarily limited by the third harmonic. The first peak of the third harmonic
for an input at 1.25 GHz occurs because the harmonic falls in the passband. The second peak in
the third harmonic for an input in the passband occurs because it falls in the repeated passband at
11.25 GHz. The power of the third harmonic mimics the frequency response of the filter, but with
the frequency increased by a factor of 3. The ripples in the curve corresponding to the power of the
second harmonic in (a–c) mimic the ripples in the frequency response. The first peak in the second
harmonic occurs because the harmonic falls in the passband. The second and third peaks, however,
correspond with the frequency response peaks in the stopband. These latter peaks are not 18 dB
lower compared to the first peak, as would be expected from the frequency response, because of
118
Figure 4.18: Power of the fundamental component and several distortion components at the output
of a three-bit, seven-tap bandpass filter with the standard deviation of delay mismatch of (a) 5 ps
(b) 10 ps, and (c) 20 ps for a range of input frequencies.
the increase in second-order distortion power with frequency. For the case of σTD = 10ps in (b), a
maximum in-band attenuation of 1.5 dB and minimum SFDR of 24 dB are still achieved. It can
be concluded from this study that three-bit resolution and standard deviation of delay mismatch in
the range of 5ps to 10 ps is sufficient for an SFDR of 20 dB.
4.2.3 Effects of delay jitter on the noise performance of the system
For CT sampled signals, the signal information is contained in the timing of the digital outputs.
Delay jitter adds random error to the digital signal representation, which results in additive noise at
the reconstructed filter output. The delay jitter is assumed to have a zero-mean Gaussian distribu-
tion. The effect of delay jitter on the performance of a CT DSP is now discussed, with the effects
of random and global delay mismatch excluded from the study. Fig. 4.19 shows the signal power
at the input frequency and in-band error power at the output of a seven-tap bandpass filter for sev-
eral input frequencies for a noise bandwidth of 25 GHz and an output bandwidth of 6 GHz. For
frequencies below 2 GHz, the third harmonic falls in-band and dominates the in-band error power.
Even for a standard deviation of delay jitter of 8 ps, which is excessively high, the in-band noise
power is small compared to the power of in-band harmonics or in-band intermodulation products.
119
Figure 4.19: Signal power and in-band error power at the output of a three-bit, 7-tap bandpass
filter for a standard deviation of delay jitter of (a) 0.5 ps (b) 2 ps, and (c) 8ps.
4.2.4 Summary
It can be concluded from the study of the effects of global input-polarity-dependent delay mis-
match, random delay mismatch and delay jitter that the performance of a GHz-range CT DSP is
limited by delay mismatch, global and random, rather than jitter for typical values of delay varia-
tion and jitter in submicron CMOS technologies. The sharpness of frequency response notches is
limited by the delay mismatch, which prevents perfect signal cancellation. For frequencies beyond
about 4 GHz and global delay mismatch and the standard deviation of random delay mismatch of
over 10 ps, the attenuation of the fundamental component and the increase in the half-harmonic
distortion start to limit in SNDR and SFDR. The maximum effective resolution of a CT gigahertz-
range digital processor depends on the maximum input frequency, which is the limiting case be-
cause the highest frequency is the most susceptible to half-harmonic distortion that is caused by
random and global delay variation. For example, for signals at frequencies up to 5 GHz, the
maximum effective resolution that can be achieved with current CMOS technologies is limited to
3-4 bits due to the input-polarity-dependent delay discrepancy and the standard deviation of delay
mismatch on the order of 10 ps.
120
Chapter 5
Gigahertz-range CT DSP implementation
The focus of this work is a gigahertz-range CT digital processor, the implementation of which is
detailed in this chapter. The per-edge-encoded digital signals are assumed to be supplied by a CT
ADC and the output of the filter is made testable through an output buffer. Sec. 5.6 describes an ex-
ample implementation of a gigahertz CT ADC and explains an example UWB receiver application
for which the CT DSP in this work was developed; two prototype chips were implemented.
A gigahertz CT per-edge-encoded digital processor is intended for low-dynamic-range appli-
cations in the lower gigahertz signal range. The design considerations described in this chapter
Figure 5.1: (a) Frequency response a CT bandpass filter realized with the coefficient of a DT
highpass filter and (b) frequency response of the DT highpass filter.
121
are based on a realistic goal of a 20-dB SNDR, which can be achieved with three-bit resolution,
and is adequate for several low-dynamic-range applications such as pulse radio, spectrum sensing
and channel equalization. Band-limited gigahertz applications typically need a bandpass frequency
response for channel selectivity and attenuation of blockers. In some signal situations, however,
the frequency response can be adapted to reduce the attenuation in one stop-band, where there are
no strong blockers, in favor of a sharper transition or higher attenuation in another stop-band that
contains a dominant blocker. For a single narrowband blocker, a notch filter can be used to sig-
nificantly reduce the processor power while using a low-order processor configuration. Classical
digital filter design tools, such as the Filter Design Toolbox in Matlab®, can be used to design
the desired filter response. The frequency response of the CT processor is the same as that of a
corresponding DT processor, except that there is no aliasing, as was explained in Sec. 1.1. Since
CT digital systems do not suffer from aliasing, a CT bandpass filter can be realized using the
coefficients of a classical DT filter with a highpass response and with the tap delay set equal to
1
fS
, where fS is the clock frequency of the DT filter. The replica passband in the DT case acts as
the upper passband in the CT case, as is shown in Fig. 5.1. The tap-delay tuning range of a CT
DSP, which must accommodate frequency response for a range center frequencies fC, therefore,
corresponds approximately to half of the inverse of the center frequency range (TD ≈ 12 fC ). The CT
digital processor in this work is realized for bandpass applications, for which case bandpass and
other frequency responses are realized by tuning the tap delays close to the value of 12 fC .
CT gigahertz processor requirements
The design specifications of a three-bit gigahertz processor, which is designed for but is not limited
to the UWB application, are as follows:
122
1. Programmability:
• One- to seven-tap FIR filter.
• 2-GHz to 4.5-GHz tunable center frequency range.
• 20-dB stopband or notch frequency attenuation.
• Three-bit coefficients.
2. Performance:
• 20-dB in-band SNDR.
• Robust response to undesirable components at frequencies up to 5.4 GHz.
• In-band intermodulation products of out-of-band components that are 20 dB below the
level of a full-scale in-band input.
The processor must also be disabled, enabled and reconfigured within a few nanoseconds to allow
for a robust response time. The power dissipation should adapt to the input activity level and
scale with the filter’s complexity. The processor must be capable of driving an anticipated load of
approximately 150 fF to 300 fF, which includes the input capacitance of the block that follows the
processor in a system implementation, the input capacitance of an output buffer for testing the filter,
wire capacitance, and the capacitance of the processor itself. As a more practical consideration,
the tap delays must be calibrated because of their asynchronous implementation, which does not
use a clock. A desired tap delay and frequency response must be achieved, after calibration, in all
process corners and within a wide temperature range.
The power dissipation of classical discrete-time digital processor in the gigahertz range is typi-
cally on the level of tens to hundreds of mW, which is prohibitive for some applications that rely on
123
battery power or have limited outlets for heat dissipation. CT digital signal processing offers the
advantage of activity-dependent power dissipation, which is attractive to high-frequency applica-
tions with sparse, burst-like signals, such as those found in pulse radio. While it is not possible to
avoid some static power dissipation in a CT DSP system, which is needed to bias the comparators
in the ADC, for example, and in bias generating circuits, static power dissipation in the processor
is avoided by using a single-ended architecture.
Timing granularity limitations
The per-edge digital encoding of gigahertz quantized signals, introduced in Sec. 4.2, allows nar-
rowly spaced tokens, which indicate quantization level crossings, to be distributed among several
per-edge processors instead of forcing a single processor to handle the high token rate. Hardware
speed constraints are significantly eased by this parallelization because the minimum time between
successive tokens of a per-edge signal is extended to the period of the fastest input, for example 166
ps for fMAX = 6 GHz. When the parallel-processed signals have to be recombined to construct the
filtered input, the timing constraints of successive level crossings are reintroduced. The minimum
timing problem is, in fact, made more difficult by the fact that the arrival of tokens at a taps, which
can occur at any time, can coincide with the arrival of tokens from a different tap of the same per-
edge signal. To accommodate such narrowly spaced incoming information, a digital adder would
have to function with a processing time of only a few picoseconds, which is not practical. Since
addition in the digital domain for this high token rate is prohibitive, the parallel per-edge signals
and also the delayed signals of each per-edge signal must be added in the analog domain. The gi-
gahertz CT processor takes advantage of the benefits of both domains, distributing the processing
efficiently between them. Digital representation is used in delaying the signals, which allows for a
124
Figure 5.2: Block diagram of a per-edge-encoded CT filter.
tunable broadband delay that is constant for all frequencies of interest, whereas the analog domain
is used in reconstruction and addition due to its (in principle) infinite timing resolution.
Signal reconstruction and addition
The arrival of a per-edge token at level m at tap k of the per-edge-encoded filter shown in Fig. 5.2
indicates that a reconstructed signal at the kth tap should be changed by an LSB, weighted by the kth
coefficient. The sign of this change depends on whether Rm,k or Fm,k has toggled. Since the addition
of the parallel paths must be realized in the analog domain, the tap coefficients can be realized as
digitally triggered analog coefficients. The reconstructed analog output of the CT digital processor
must be a voltage signal rather than a current signal to make it compatible with an anticipated
capacitive input of a block that can follow the CT DSP in a complete system. An analog adder
that generates the filtered analog output must have a bandwidth that is sufficiently higher than the
frequency band of interest so as not to distort the frequency response of the filter. For this reason,
analog adders based on operational amplifiers are not considered; such implementations demand
a large static power dissipation for bandwidths in the gigahertz range, which is prohibitive and
125
Figure 5.3: (a) Current-resistor-based (not used) and (b) charge-based analog adders.
nullifies the advantage of signal-activity-dependent power dissipation of a CT digital processor.
The sum output voltage of a digitally controlled analog adder can alternatively be realized as a
controlled current flowing through a load resistor or a controlled charge accumulated on a load
capacitor. The former scheme is first considered; a possible realization of a current-based filter
coefficient and adder at each level m and each tap k of the filter in Fig. 5.2 is illustrated in Fig. 5.3a.
A current source, whose current size is set by the coefficient weight ck, is enabled whenever a
reconstructed per-level signal LEVm,k has a logic-level high value. It is assumed that the per-level
signals initially have the same polarity if the input is initially below the corresponding quantization
level and are of the opposite polarities if the input is below the level. The per-level signal can
then be reconstructed as with an XOR operation on the two per-edge signals, LEVm,k = Rm,k⊕
Fm,k. The XOR operation, however, requires the per-edge signals to be reconstructed in the digital
domain, which reintroduces the timing issued of the per-level-encoded operations, described in
Sec. 4.1. Pulse disappearance is not a problem at this late stage in the processing chain, since the
reconstructed per-level signal propagates only until the current source switch. However, the timing
distortion in the output pulse of a digital gate for an input pulse of short duration is still a problem.
126
For a relatively long duration between consecutive toggles of the per-edge signals Rm,k and Fm,k,
which occurs at quantization levels close to mid-range and for lower in-band frequencies, the XOR
gate retains the accuracy of this duration in the output pulse width. As the input pulse duration is
decreased, the output pulse width becomes distorted, which causes an undesirable increase in the
distortion of the reconstructed signals. This distortion must be avoided.
The XOR-based adder is sensitive to falsely triggered outputs of the per-edge encoder, which
can be realized as two T-type flip-flops that toggle on opposite edges of the ADC’s per-level sig-
nal, one flip-flop for each per-edge signal. When an input per-level pulse is not long enough to
accommodate the set-up and hold times of the latches in the per-edge encoder, one of the per-edge
signals can false-toggle or fail to toggle, and the two per-edge signals will go out of phase with one
another. This would invert the reconstructed per-level signal, creating a constant 2-LSB error. This
error will remain until another false-triggering of either of the per-edge signals at that level occurs.
To prevent this catastrophic scenario, the per-edge encoder should be realized with interconnected
latches, which synchronize the per-edge signal polarities, as opposed to being generated by two
separate flip-flops.
The bandwidth of an analog adder must be high enough to prevent significant high-frequency
roll-off in the frequency range of interest. For an anticipated load capacitance of 200 fF and no
more than about 1 dB of in-band signal attenuation, the load resistance cannot exceed 100 Ω. To
accommodate an output swing range that is centered about the mid-rail voltage, one terminal of
the output resistance must be maintained at the mid-rail voltage, which requires a bias current.
One way to generate this offset voltage efficiently, without providing a DC current, is to store the
VDD/2 voltage on a capacitor, which is then periodically refreshed. To avoid a dead-zone during
a refresh time, two alternating capacitors can be used. The capacitor voltage will not remain con-
127
stant because the capacitor will retain some of the charge supplied by the current sources. To limit
the resulting drift in the capacitor voltage, a large capacitance in the range of several picofarads is
needed, which requires a large area that can be greater than the area of the entire processor. The
bandwidth of the current-resistor-based adder is sensitive to the capacitance at the output node,
which is imprecisely estimated with a circuit extraction tool due to the complexity of the filter
structure. Additional unaccounted-for capacitance is detrimental to this implementation, which,
when combined with the possible static power dissipation of the resistor-based implementation,
and the distortion of the reconstructed per-level signals, make the current-resistor-based imple-
mentation a poor candidate for the gigahertz processor’s analog adder.
The chosen architecture for the digitally controlled analog adder is based on the accumula-
tion of a controlled amount of charge onto the load capacitor, as shown in Fig. 5.3b. Consider
the signals at the mth level and the kth tap. Each edge of the per-edge signals triggers the charge
pump to add or remove a controlled amount of charge to or from the load capacitance, for Rm,k
and Fm,k, respectively. The charge difference causes the output voltage to step up or down by a
coefficient-weighted LSB. The amount of generated charge is set according to the size of the kth
filter coefficient. The per-level signal is effectively reconstructed without using a digital gate to
combine the per-edge signals. Since the per-edge signals remain in parallel throughout the digital
processing and are only combined once they are in the analog domain, the timing limitations and
pulse distortion of the XOR-based reconstruction are avoided. The dynamic power dissipation of
the digital circuits that regulate the charge pump timing is typically more than the power dissi-
pation that would be needed to control the current sources in the current-resistor implementation.
The power dissipation of the charge-based adder varies with input frequency, unlike the case of the
current-resistor adder, which dissipates the same average power regardless of in-band frequency.
128
For cases of burst-like inputs, the charge-based adder is more efficient because its power decreases
with lower input activity. For a constantly present signal with a frequency corresponding to the
average of the frequency range, the power dissipation of both implementations can be compara-
ble. An increase in the load capacitance from the predicted value, as is bound to happen due to
parasitic wire capacitance, is not problematic for the charge-based adder because it results merely
in a decrease in the output voltage swing at all frequencies, without attenuating higher in-band
frequencies or distorting the frequency response of the filter. The charge-based implementation is,
therefore, preferred due to the bandwidth insensitivity to the load capacitance, due to the avoid-
ance of timing limitations and timing distortion in per-edge signal recombination, and due to the
signal-dependent power dissipation.
5.1 Gigahertz CT digital FIR filter architecture
The designed considerations for a gigahertz CT DSP implementation are described in this section.
A generalized discussion of the design considerations is accompanied by a detailed explanation of
the CT DSP realization, which has been implemented in two versions, with minor changes between
the two implementations, which will be detailed. The discussions and results are primarily based
on the processor implementation in the first chip. The system was realized in ST’s 65 nm standard
CMOS process with a 1.2-V supply voltage. Low-threshold devices were used for maximum
speed.
A block diagram of a CT digital processor for gigahertz-range frequencies based on per-edge
encoding is shown in Fig. 5.4. A three-bit CT ADC with a mid-rise transfer characteristic generates
seven per-level signals, each of which is encoded for per-edge representation. The mid-rise char-
129
Figure 5.4: Block diagram of a three-bit gigahertz-range per-edge encoded digital processor, com-
posed of a joint FIR filter-DAC block.
acteristic is chosen for improved sensitivity. Instead of an ideal transfer characteristic, shown in
Fig. 5.5a, the processor realizes a shifted transfer characteristic shown in Fig. 5.5b. The 12∆ upward
shift is equivalent to a DC offset of the reconstructed output and does not cause distortion or error
in the frequency range of interest. The DC offset can theoretically be eliminated by precharging
the load capacitor to cancel that offset, but this is unnecessary.
The per-edge signals, Rm and Fm, at each level are processed by identical per-edge processors,
which are shown in Fig. 5.4. The per-edge FIR filters are composed of CT delay cells as tap delays
and bi-directional charge pumps as tap coefficients. The asynchronous digital delay blocks have
a digitally controlled tunable delay range from 110 ps to over 250 ps. The tap coefficients have
a tuning range corresponding to a three-bit range. A load capacitor, CADDER, accumulates the
charge generated at each level and each tap and serves as the analog adder. The load capacitor
is variable with a two-bit range to allow for additional gain control besides the filter coefficients.
130
Figure 5.5: (a) Ideal mid-rise quantizer transfer characteristic and (b) a mid-rise characteristic
with an LSB/2 DC offset.
Any mismatch between positive and negative charges of the charge pumps gets accumulated on
the load capacitor. In order to prevent the capacitor voltage from drifting in the direction of one of
the supplies, a DC control block maintains the average value about which the output swings at the
mid-rail value and corrects any offset.
Example signals at the mth level and the kth filter tap are illustrated in Fig. 5.6. The per-
edge signals of the previous tap output are delayed asynchronously by parallel delay cells. Both
the rising and falling transitions of the per-edge signals are delayed by the tap delay, TD. For a
positive filter coefficient, on each edge of the positive-edge signal Rm,k, the bi-directional charge
pump produces a positive current pulse of short duration, which accumulates a controlled amount
of charge, corresponding to the coefficient size, onto the load capacitor. This causes the output
voltage to step up. On each edge of the falling-edge signal Fm,k, the bi-directional charge pump
produces a negative current pulse of the same duration, which removes the accumulated charge.
The output voltage steps back down. The per-level output is thereby reconstructed, however instead
of ideal step-like transitions, the output voltage has a finite rise time that is equal to the duration
131
Figure 5.6: Example signals of a CT DSP in Fig. 5.4 at the mth level and the kth tap.
of the current pulses. The sign of the coefficient is set by controlling whether Rm,k or Fm,k triggers
the positive or negative charge pump.
The filter complexity can be changed from a single-tap to a seven-tap configuration. For lower-
order filter configurations, the higher taps can be turned off by disabling inputs to the delay cells
to in order to save power, in addition to zeroing the coefficient values. The entire filter can be
turned off merely by disabling the delay cells; there is no need to disable the biases of the delay
cell and tap coefficients. When the filter is disabled, the DC control block is turned off and the
output voltage is precharged to VDD/2. The filter can be enabled and be ready to accept inputs
in under 100 ps, the time it takes to propagate the enable signal and turn on the delay cells. The
reconfiguration time of the filter depends on the time needed for the bias voltages to settle and is
in the range of several nanoseconds. The fast response and wake-up times make the CT processor
well suited to activity-sensitive applications.
132
The total power dissipation of the processor is described by










where PSTAT IC is the total static power dissipation, EDcell is the energy dissipation of the delay cell
per sample, EQpump is the energy dissipation of the bi-directional charge pump per sample, K is
the filter order, AIN is the input amplitude and AMAX is the maximum input amplitude. The second
term in the expression accounts for the dynamic power consumption of the processor, which is
caused by the energy dissipated in the delay cells and charge pumps whenever these blocks are
triggered by an incoming token. The number of the filter blocks that are activated depends on the
filter order K and the number of quantization levels that the input crosses. The dynamic power
dissipation increases with input amplitude because more quantization thresholds are traversed and
with frequency because they are crossed more often. The power dissipation of the CT processor
adapts to the activity level of the input. When there are no inputs to process, the power dissipation
automatically reduces to the minimum power dissipation, which is attributed to the fairly negligible
leakage power of the transistor and the static power consumption of the DC bias block. The
static power dissipation of the processor is a small fraction of the peak power of the processor.
Bias-generating circuits of the prototype chip implementation consume a notable amount of power
because they are designed for ease of control and testability rather than power efficiency; in a
product implementation, bias circuits would be designed to consume a small fraction of the current
power. The bias-generating circuits are therefore considered to be part of the testing structure.
The design of the gigahertz CT DSP blocks and filter-tuning schemes are detailed in the fol-
lowing sections.
133
5.2 Continuous-time delay-cell design
In serially encoded CT systems, the tap delays have to consist of several granular delay cells
because the minimum time between consecutive samples is typically less than the desired tap delay.
In gigahertz-range CT processor detailed in this work, the complexity of the tap delay block is
reduced by the per-edge signal parallelization, which extends the minimum time between tokens of
each per-edge signal to the period of the maximum input frequency, 1/ fmax. For bandpass gigahertz
applications, the tap delay, TD, is typically set close to 12 fc , where fc is the center frequency of the
signal band, as was explained in the introduction of this chapter. Since fmax < 2 fc, the time between
consecutive tokens is longer than the desired tap delay and, as a result, the tap delay block can be
realized as a single element of delay TD, in contrast, for example, to the case in [15], where 400
delay elements were used per tap delay block.
The design specifications for the clockless delay element, as dictated by the system and perfor-
mance requirements, are as follows:
1. The tap delay must have a finely controlled but wide tuning range of 110 ps to 250 ps to ac-
commodate a maximum center frequency span of 2 GHz up to 4.5 GHz, with fine frequency
resolution.
2. The energy dissipation per conversion must be minimized to limit the filter’s power dissipa-
tion.
3. Rising and falling input edges must be processed with equal delays.
4. The delay element must fully reset and be ready for the next delay operation within a short
time interval after the completion of the previous delay operation.
134
The last specification is necessary to prevent signal distortion, which can occur when a delay ele-
ment, not fully recovered from the previous operation, distorts the timing information of the signal.
The minimum interval between consecutive delay operations, which is equal to the minimum input
period, must be sufficiently longer than the tap delay to ensure that the internal node voltages of
the delay block settle to their final values.
Consider the case when a new token arrives shortly after the previous token has been processed,
such that the internal node voltages are not fully settled. The new token will be processed with a
shorter delay because the internal nodes that have not settled will undergo smaller voltage excur-
sions. The delayed digital signal will then have a duty cycle error; in the case of a periodic input,
this error will result in an increase in harmonic distortion. Since the per-edge encoded signals
vary with half the frequency of the input, this delay distortion will introduce half-harmonics to the
spectrum of the processed signal, as was discussed in Sec. 4.2.1. Such distortion is of particular
concern in the presence of undesirable signals that are above the frequency band of interest and
have not been sufficiently attenuated by stages prior to the processor. For example, ultra-wideband
(UWB) systems suffer from blockers in the neighboring WLAN band. If the tap delay TD plus
the reset time TRST is longer than the period of a blocker signal, the distorted blocker signal can
introduce half-harmonics into the UWB channel bandwidth. It is assumed that the tap delay of
the processor is chosen close to 12 fC , where fC is the center frequency of the frequency band of
interest. The minimum input period, therefore, should not exceed TRST + 12 fC . The maximum al-







For UWB application with a lowest center frequency of 3.5 GHz and little attenuation of the
WLAN blockers at 5.2 GHz, the delay elements are allowed a maximum reset time of 50 ps.
5.2.1 Delay-cell topologies
Delay cells are basic building blocks that have been used in ring oscillators, VCOs, DLLs, time-to-
digital converter, fully-digital ADCs and other applications, and are possible candidates for use in
a gigahertz CT processor [39–43]. Analog-based delays, such as those used in channel equalizers,
are excluded from consideration because the goal here is to take advantage of the noise immunity
and wideband programmability of digital domain processing, which are nullified by returning to
the analog domain. Only clockless digital delay implementations are considered.
Digital delay block implementations can be categorized as being differential or single-ended.
The former implementation makes use of current-mode logic (CML) circuits, shown in Fig. 5.7a,
which offer a major advantage of an equal delay on either input edge polarity because the same
mechanism is used for both edge polarities, unlike in the case of a single-ended implementation.
Delay discrepancies due to random local device mismatch are neglected in the comparison of
delay cells since all implementations suffer from this nonideality. A CML implementation allows
for an easy control of the delay via the bias voltage VN of the “tail-current” device M1. A major
disadvantage of a differential delay implementation is the static bias current that each cell draws,
which negates the advantage of input-activity-dependent power dissipation and would cause the
power dissipation of a gigahertz processor to be prohibitively high. Differential implementations
of the delay cell, as well as of all processor blocks, are eliminated from further consideration in
this thesis.
136
Figure 5.7: Possible digital delay-cell schemes based on (a) CML gate, (b) inverter with variable
capacitive load, (c) current-starved inverter, (d) delay cell with fighting transistor.
The most basic and commonly used singled-ended implementation of an asynchronous delay
cell is an inverter. Since an inverter’s delay is typically only 15 to 30 ps in submicron technolo-
gies without an additional capacitive load, several such inverters can be cascaded to achieve the
desired delay, with coarse delay tuning realized by adding or bypassing a number of inverters in
the series chain. The delay of an inverter is based on the charging of an effective load capaci-
tor C, which includes the gate capacitance of the next stage, drain capacitance of the inverter’s
transistors, parasitic capacitances, as well as a possible additional capacitor, through a transistor
with finite drive strength. The inverter’s propagation delay can be increased by increasing the load
137
capacitance or by reducing the inverter’s drive strength. A variable load capacitance offers a fine
delay-tuning range, and can be realized as an array of binary-weighed capacitors and switches or
a voltage-controlled capacitor, such as a varactor, as is illustrated in Fig. 5.7b. The energy dissipa-
tion of a digital gate for two conversions, corresponding to a rising-edge and a falling-edge input,
is equal to CV 2DD, plus additional energy dissipation attributed to crowbar currents. Increasing the
load capacitance to realize a longer delay results in an increase in energy dissipation and makes
this implementation energy inefficient. As an alternative means of fine delay control, the drive
strength of the inverter can be reduced by current starvation, as is shown in Fig. 5.7c. The delay
can be controlled by varying the bias voltages VN and VP of the current-starved devices, without
increasing the load capacitance and, thus, without having to increase the energy consumption.
Both variable-delay single-ended inverter implementations suffer from poor signal integrity at
the inverter’s output. The delay mechanism of an inverter can be modeled as a capacitor charged
through a nonlinear resistance. The output voltage, then, follows an RC-like settling trajectory,
which has a faster rate of change during in the beginning portion, when the cell starts actively
delaying the input, a slower slew-rate near the mid-range voltages, and is then followed by slow
settling behavior. These delay-cell implementations fail to meet the fourth design specification,
listed in the beginning of this section, due to the slow settling behavior. They also cause large
crowbar currents in the circuit that follows them due to the slower voltage slope near the mid-
range value. These crowbar currents can offset the energy savings of an efficient delay-cell design.
An alternative way to weaken an inverter’s drive strength, based on feedback, is proposed
in [44] and is similar to a Schmidt trigger [45]. In the circuit shown in Fig. 5.7d, the output of
a fighting inverter, which is controlled by the output of the delay cell, is connected to the output
of a primary inverter. To explain the cell’s operation, it is assumed that the cell has completely
138
finished processing an input and its node voltages are fully settled, with the input voltage and output
voltages at ground potential. The output voltage of the primary inverter and the fighting inverter is
equal to VDD. When the input is raised to VDD, the primary inverter’s nMOS transistor is enabled
and it begins to discharge the effective load capacitance. The fighting inverter’s pMOS transistor,
however, remains enabled because the delay cell’s output holds its value. The fighting inverter,
then, steers some of the charging current of the primary inverter from the effective load capacitor.
When the output voltage reaches the midrail voltage, and the delay operation is complete, and the
fighting inverter switches. The delay can be varied granularly by changing the size of the fighting
inverter digitally through switches. The primary inverter’s output trajectory can be partitioned into
three consecutive parts:
1. A high-slew-rate segment due to the nMOS transistor of the primary inverter being fully
on and in saturation, thus sinking a large current, while the pMOS transistor of the second
inverter is fully on but in triode, since the output voltage is around VDD, thus supplying little
current.
2. A low-slew-rate segment when the VDS of the primary inverter’s nMOS transistor decreases
as the VSD of the second inverter’s pMOS transistors increases; the current of the second
inverter cancels the majority of primary inverter’s current, thus leaving less charging current
for the effective load capacitor and causing an increase in the propagation delay as compared
to the case with no fighting inverter.
3. A final high-slew-rate segment when the output has already toggled and the second inverter
aids rather than fights the primary inverter during the final settling portion.
While this delay cell offers an improvement in the final settling behavior of the delaying inverter
139
over the cells in Fig. 5.7b and c, its rate of change at the beginning of the delay cycle is undesirably
high, making this implementation less delay efficient. This delay cell is, likewise, very energy
inefficient. In contrast to a current-starved inverter, all of whose current is used to charge the
capacitor, in this feedback-based delay cell, most of the current of the primary and secondary
inverters is wasted throughout the delay operation due to the fighting of the two inverters and only
the small difference between the currents is used for the capacitor. The wasteful dissipation of this
cell makes it a poor candidate for gigahertz-range processor implementation.
To summarize, the aforementioned delay cells are eliminated from considerations for the fol-
lowing reasons:
1. Differential implementations: undesirable static power consumption
2. Inverter with variable load capacitance: poor signal integrity, crowbar currents, energy in-
efficient
3. Current-starved inverter: poor signal integrity, crowbar currents
4. Fighting-inverter cell: high energy dissipation.
The analysis of the benefits and shortcomings of these common delay-cell implementations leads
to the conclusion that a widely tunable energy-efficient single-ended delay cell should be based
on current starvation, but requires a boost in the charging current after the completion of the delay
operation. Such a delay cell was proposed in [46] and makes use of thyristor-like positive feedback.
An alternative implementation of the cell is presented in [47], and was used in a CT DSP for
voiceband applications [15]. A simplified implementation of a thyristor-like delay cell is illustrated
in Fig. 5.8. The idea is to slowly charge a load capacitor with a control current, I, which is enabled
140
on a rising-edge input, until the capacitor voltage reaches a threshold voltage, VT H . Upon reaching
the threshold, a positive feedback circuit is enabled, which causes the capacitor to be charged much
faster. Once the delay operation is complete, the circuit is reset on the opposite edge of the input by
discharging the capacitor and turning off the current sources. This delay circuit produces a tunable
delay for positive-edged inputs only and relies on the negative edges for a reset phase.
Figure 5.8: Thyristor-like-based digital delay cell for a rising edge input (reset on the falling edge,
reset circuit not shown) and associated signals.
To make the delay cell either-edge sensitive, [46] uses two complementary thyristor-like cir-
cuits in parallel, one for each input edge polarity. When one load capacitor is slowly charged on
one input edge, the other load capacitor is reset, and visa versa. An alternative solution for de-
laying both input edges is provided in [47], which uses self-timed handshaking circuits to control
the single-capacitor circuit described in Fig. 5.8. Input edges of either polarity are encoded as
short-duration pulses of the same polarity, which are then used to control the operation of the cell.
These pulse and self-resetting digital circuits enable the charging of the capacitor, reset the cell by
discharging the capacitor and then reset themselves. During the reset phase, the cell cannot receive
any new tokens. This cell is well suited for applications with desired delays of a few nanoseconds
141
to several hundred microseconds because the duration of the reset phase and the pulse widths of the
self-resetting signals can be in the range of a couple of nanoseconds. For applications in the 3-GHz
range, however, the circuit reset time is allowed a maximum duration equivalent to about three to
four times the fanout-of-one (FO1) delay of an inverter. Fully discharging the load capacitor in
this allotted reset time or even generating the self-timed control signals is infeasible. A CT delay
cell for a gigahertz processor, therefore, must operate on either edge of the input without requiring
a reset phase.
From an energy-efficiency perspective, while the thyristor-like delay cell is efficient compared
to the aforementioned basic delay blocks, dissipating 50% of the energy per delay operation [47],
these delay cells are still wasteful because the capacitor charge is dumped after the delay oper-
ation is complete; therefore, an energy of 12CV
2
DD, which was stored on the capacitor, is wasted.
An alternative delay cell, detailed in the next sections, has been designed for a gigahertz-range
processor, which takes advantage of the benefits of the thyristor-like delay cell and satisfies the
aforementioned design specifications.
5.2.2 Energy-efficient CT digital delay cell
The key to realizing an energy-efficient delay cell, based on the charging of a capacitance, is
to reduce the size of the charging current instead of increasing the load capacitance or cascading
several short-delay cells; this is a benefit of a current-starved inverter-based delay cell. Wasted en-
ergy due to crowbar currents can be prevented by using positive feedback to significantly boost the
rate of change of the capacitor’s voltage when it reaches values near the mid-rail, a benefit offered
by the thyristor-like-based delay cells. The delay cell designed for a gigahertz-range processor,
142
described in this section, offers further energy savings by recycling the capacitor charge that ac-
cumulates during one input edge during the next input edge, instead of throwing away the charge
at the completion of each delay operation. Such charge recycling would make it possible for a
delay element to consume half of the energy per delay operation as compared to that of [46, 47],
and would lead to even higher energy savings when compared to the cells described in the pre-
vious section. This is accomplished in the proposed delay-cell architecture, which is shown in
Fig. 5.9 [33].
Figure 5.9: The schematic diagram of the energy-efficient delay cell with positive feedback.
The device sizes of the transistors that comprise the cell are listed in Table 5.1; all devices are
of the minimum length (65 nm). The delay cell is based on a current-starved inverter with positive
feedback. Devices M3 and M4 act essentially as switches that enable and source-degenerate either
the nMOS, M1, or pMOS, M2, current-starved transistors, which are responsible for the slow
discharging and charging of the load capacitor. Thyristor-like positive feedback device pairs are
formed by transistors M5 and M8 for rising-edge inputs and M6 and M7 for falling-edge inputs.
143
device width device width
(µm ) (µm )
M1 0.48 M2 0.96
M3 0.24 M4 0.48
M5 0.16 M6 0.32
M7 0.12 M8 0.24
M9 0.18 M10 0.4
M11 0.36 M12 0.8
M13 0.36 M14 0.8
M15 0.16
Table 5.1: Delay-cell transistor widths. All transistors are minimum length of 65 µm.
Transistors M8 and M7, which form a feedback inverter, serve as sense devices because when they
begin to turn on, positive feedback is triggered. In order to achieve delays in the range of 100 ps to
200 ps, and to dissipate a near-minimum energy per conversion, no explicit external capacitance is
used at the output of the current-starved inverter. An effective load capacitance is a combination of
all the capacitances, intrinsic and parasitic, at the node corresponding to Vc:
C ≈(Cgd,1+Cdb,1)+(Cgd,2+Cdb,2)+(Cgd,5+Cdb,5)+(Cgd,6+Cdb,6)
+(Cgs,7+Cgd,7)+(Cgs,8+Cgd,8)+CWIRE (5.3)
where CWIRE is the parasitic wire capacitance. If the target delay is above a few hundred picosec-
onds, a small additional capacitor can be added.
The operation of the cell is illustrated in Fig. 5.10. It is assumed that the input is initially held
at ground potential, the output is at the VDD potential and that all the internal nodes of the cell are
fully settled before a new token arrives. At the start of a delay operation, a rising-edge input en-
ables the pull-down network composed of M1 and M3. The feedback device M5 remains disabled
because the output of the feedback inverter holds a logic-level low value. When device M3 turns
144
Figure 5.10: Example signals of the delay cell.
on, it begins to slowly discharge the effective load capacitance C, which has been precharged to
VDD potential. As the capacitor voltage approaches the switching voltage of the feedback inverter,
VT H ≈ VDD2 , the inverter begins to switch, and triggers the positive feedback action. The feedback
device M5 starts to turn on and helps discharge C even faster. Due to the positive feedback mech-
anism, the capacitor is quickly discharged the rest of the way to ground potential and all internal
nodes settle to their final values; the cell is ready for the next delay operation of the opposite po-
larity, without requiring any reset phase. On a falling-edge input, the same action happens with the
opposite polarity.
After the completion of the delay operation, the delay cell requires approximately 30 ps in
the positive-feedback-enabled phase for the capacitor to be fully charged and the nodes to settle.
This settling time is somewhat dependent on the overall delay of the cell. If this delay cell is used
for delays above 300 ps, the feedback devices M5/M6 will be providing most of the discharg-
ing/charging current during the settling phase because M1/M2 will be biased with a much smaller
145
current to prolong the delay.
When the desired delay is well above a couple hundred picoseconds, as may be needed for a
different application, the feedback inverter will dissipate a nonnegligible crowbar current when the
capacitor voltage varies slowly near the inverter’s switching voltage. Thus, device M8 should be
disabled when M7 is sensing and visa versa. The completion of the delay will then correspond
to the capacitor voltage reaching the threshold voltages of the sensing devices rather than the
switching voltage of the feedback inverter. To prevent wasteful crowbar currents, M7 should be
disabled through a series nMOS switch on a rising-edge input and M8 should be disabled through
a series pMOS switch on a falling-edge input. These switches should be driven by the inverted
input signal. No additional inverter is needed because an intermediate output of the previous delay
cell, the input node of the last inverter M13-M14, can be used to provide the inverted signal. For
the intended delay range of the implemented gigahertz processor, there is not enough time for the
feedback inverter to develop significant crowbar currents; the scheme in Fig. 5.9 is used because
incorporating the feedback switches is unnecessary, and would increase the energy consumption
and lengthen the feedback start-up time.
The controlled current source is formed by a current-controlled device M1 (M2) and a switch
M3 (M4), which enables the starved device at its source terminal. This realization is advantageous
over the alternative orientation where the positions of the two devices M1 (M2) and M3 (M4) are
switched. When the switch device M3 (M4) is connected between the source terminal of M1 (M2)
and ground potential (VDD), M1 (M2) serves as a buffer and keeps the transitions of the input signal
from feeding through to the critical node of the effective load capacitor. The alternative realization
suffers from such feedthrough. In the chosen realization shown in Fig. 5.9, M3 is in the triode
region of operation when it is enabled and has a small VDS,3 which does not vary significantly
146




and degenerates the source of the current-starved device. The source degeneration is beneficial
for the well-known reasons of linearizing the current source (to extend the linearity of the current
tuning range), making it less sensitive to VT variation, and boosting the output resistance of the
controlled current source to be:
Ro ≈ ro1(1+gm1RS) (5.5)
The output of the current-starved inverter is buffered in order to decouple its effective capac-
itive load, which controls the delay of the cell, from the overall loading of the delay cell. The
two-inverter buffer, composed of devices M9-M14, also sharpens the output voltage transitions
and is designed to be capable of driving the following delay cell, the tap coefficient charge pump,
and the parasitic wire capacitance. The buffer is placed at the output of the current-starved inverter
instead of at the output of the feedback inverter because the extra capacitive loading it adds to
the effective capacitance at node C is tolerable. The extra capacitance at the feedback node cor-
responding to Vf would slow down the feedback action, causing not only the cell delay, but also
the time required for the internal nodes to fully settle, to be prolonged. The resulting increase in
the minimum delay can cause a failure to meet the required delay specification for the slow-slow
process corner and for higher operating temperatures. The increase in the settling time reduces the
maximum input frequency that can be tolerated, and makes the system sensitive to in-band half-
harmonic distortion of out-of-band blockers. To combat the slowing down of the feedback action,
147
the feedback inverter has to increase in size, which causes an increase in the energy dissipation.
The additional capacitance still causes the delay and the settling time to increase, requiring the
size of the feedback devices M5 and M6, as well as the switches M3 and M4 to be increased. The
designed cell is therefore optimized for power and delay by putting the buffering inverters at the
output of the current-starved inverter.
A disabling structure must be realized within the delay cell in order to be able to turn off higher
taps for lower-order filter configurations. The alternative is leaving the delay cells on while tuning
unused coefficients to zero, which is wasteful of power and is, therefore, avoided. A disabling
structure adds undesirable capacitance overhead and reduces the drive strength of the stage at
which it is incorporated. As a result, it should not be placed in the main delay circuit. It is
possible to realize the disabling structure in the main delay circuit by turning off the bias voltages
of the current-starved devices, M1 and M2, and presetting the capacitor voltage to logic-level high
or low at the cost of a some additional capacitance. If the bias voltages are supplied through
complementary switches, as is necessary for the disabling structure, when devices M1 and M2 are
enabled during normal delay-cell operation, a large amount of gate charge must be supplied on
turn-on, which causes the bias voltages to fluctuate. To correct this, the bypass capacitors at the
bias nodes within each delay cell would have to be significantly increased; longer wire lengths
would be needed for interconnections in the filter due to the increased grid size. More importantly,
the filter would require a long turn-on time due to the exponential settling of the delay-cell biases.
For these reasons, no disabling structure is incorporated within the primary delay-cell mechanism.
The most natural candidate for the disabling structure is in the delay-cell buffer. It is critical for
the second buffering inverter to have a high drive strength in order to drive the large capacitive load
of the delay cell with sharp output transitions. The disabling switches are, thus, realized in the first
148
buffering inverter as transistors M11 and M12, while the transistor M15 is used to set the disabled
delay cell’s output to a logic-level high. A reset-to-low delay cell is realized by replacing M15
with a pMOS device connected to VDD instead of ground and turned on by EN. The reset-to-low
and reset-to-high delay cells alternate at every other tap so that when the filter is enabled, there are
not voltage collisions and therefore no transient settling behavior. The CT processor is then ready
to process the input after about 100 ns , the time it takes the enable signal to propagate and enable
the buffering inverter, which makes the CT system robust.
Since the delay mechanisms for rising- and falling-edge inputs are different, where one is
based on an nMOS pull-down network while the other one is based on a pMOS pull-up network,
the propagation delays, which are designed to be equal, are inevitably slightly different. Care is
taken to ensure the delay of all processor blocks are the same for either input polarity within a
small error, as will be described later in the section. The systematic delay difference τ for rising-
and falling-edge inputs is present in all delay cells at all levels. If the delay cell is realized as non-
inverting, this delay error will accumulate through the delay chain. At the output of k delay stages,
the output signal will be distorted because positive pulses will be shortened by kτ and negative
pulses will be lengthened by kτ. At the output of each tap, then, the half-harmonic distortion of
the signal will get progressively worse. If the delay cell is made inverting, the accumulation of the
timing error is prevented and the delay error introduced at one tap will be corrected at the next tap,
as discussed in Sec. 4.2. The output of the delay cell is therefore inverted.
Delay of the CT delay cell
The delay of the cell is the sum of the time it takes the capacitor voltage VC to reach the
switching threshold of the feedback inverter, TT H , the propagation delay of the feedback inverter,
149
Figure 5.11: (a) Half-circuit of the current-starved inverter and the bias-generating circuit and (b)
a model of the delay mechanism.
Tf , and the propagation delay of the output buffer, Tbu f . The majority of the delay is made up
by the slow charging of capacitor C. If the current through the enabled current-starved devices is
approximated with its average value I (averaged during the charging phase until VC = VT H), the
capacitor voltage varies linearly with time, with a rate of change of S = IC . The time it takes the








The charging current, in reality, is not constant throughout the charging phase because the
output resistance, given in eq. 5.5, is finite. Instead of varying with a constant rate of change, the
capacitor voltage trajectory until positive feedback is triggered is closer to an RC-settling trajectory.
The model of the delay mechanism is shown in Fig. 5.11 for the nMOS transistor half-circuit of
the current-starved inverter; the source resistance (due to M3) and the output resistance Ro are
approximated as constant during the first phase of the delay operation until positive feedback is
triggered. The resistance Ro is given in eq. 5.5 and the current I = IN− VNRo , where IN is the reference
150
current of the bias circuit and VN is the bias voltage. The current I is defined such that for VC =VN ,
the current through the capacitor is equal to IN . The capacitor voltage trajectory until positive
feedback is triggered is given by the following expression for the case of a rising-edge input:
VC(t) =VDDe−t/RoC− (IRo)(1− e−t/RoC) (5.7)
The time it takes to reach the threshold is then:
TT H = RoC ln
{
VDD+ IRo
VT H + IRo
}
(5.8)
Fig. 5.12 illustrates how the trajectory of VC(t) changes with the bias current. For higher bias
current values, corresponding to shorter delays, the rate of change appears almost constant and
eq. 5.6 can be used to approximate the delay. For longer delays, the capacitor voltage trajectory
notably differs from a straight line and follows an RC-like settling behavior; eq. 5.8 should be used
in this case to determine the threshold crossing time.
Delay tuning range
The delay range is made tunable via the currents of the current-starved devices M1 and M2. A
digitally-controlled bias generating circuit for the two devices, shown in Fig. 5.13, offers a six-bit
tuning range based on binary-weighted cascoded current sources and replica current sources of the
delay cell. The cascoded current source was chosen to improved linearity of the current tuning
range and is biased to require little headroom in order to extend linear range. Since the delay
is calibrated using a tuning scheme, a linear tuning range in not necessary but is preferred. The
151
Figure 5.12: Transient signals of the output voltage of the current-starved inverter with positive
feedback for a range of bias current values, IN .
cascode bias generation for both the pMOS and nMOS circuits is based on a single off-chip current
(not shown) which is nominally set to 100 µA and offers an extra handle for delay control. The
bias-generating circuit for the delay cell is made for easy control, testability and debugging, rather
than power efficiency; thus a more efficient bias generating scheme, for example using a single
low-power DAC, should be used when implemented in a system that will go to production.
Figure 5.13: Six-bit bias-generating circuit for nMOS and pMOS control-current sources.
The cell’s delay range is shown in Fig. 5.14 versus the value of the six-bit control word for
152
several process corners and temperature ranges. The minimum delay can be reduced slightly to
compensate for the speed reduction in the slow-slow corner by increasing the off-chip reference
current that is used to generate the cascode bias voltages in Fig. 5.13. Using replica circuits to
generate biases for the delay-cell current sources, M1–M4, makes the delay element less sensitive
to process and temperature variations, when compared to “RC”- and inverter-based delay cells
because global variations are partially tracked by the replica bias circuits. The effects of process
and temperature variation on the propagation delay of the feedback inverter and buffer, however,
are not tracked, which accounts for some of the discrepancy in the tuning range.
Since the delay is inversely proportional to the digitally controlled bias current, IP (IN for the
case of a rising-edge input), the delay increases significantly as the bias current approaches its
minimum value. In the intended delay range of 110 ps to 250 ps, which corresponds to higher
current values and a control word range of 20–60 (for a typical-typical process), a well-controlled
fine tuning range is achieved.
Figure 5.14: Delay range versus control word value for several process corners and temperatures.
Since the delay on a rising- and falling-edge input is caused by two separate mechanisms, a
discrepancy in the delays is inevitable. Without calibration, the percent error between the delay for
153
a rising- and falling- edge input for a typical-typical process, shown in Fig. 5.15, is below 4% in
the delay range of interest. The discrepancy between the delays can be reduced with calibration.
Figure 5.15: Percent error in delays for rising- and falling-edge inputs without calibration.
Energy per delay operation
Since the delay of the cell is varied by changing the rate at which capacitor charge is supplied
via the controlled current, without having to change the capacitance value, the energy consumption
per delay operation is fairly independent of the delay, as is shown in Fig. 5.16. Some variation in
the energy consumption occurs because the capacitance of the current-starved devices, M1 and
M2, depends on the bias conditions of these devices. Additionally, as the delay is increased and
Figure 5.16: Energy consumption per delay operation versus delay.
154
the slope of the input and output node voltages of the feedback inverter are reduced, the energy
dissipation of the feedback inverter and the first buffer increase because small crowbar currents
start to develop. The minimum energy per delay operation occurs in the range of 120 ps to 140 ps,
which is optimized for the UWB application.
Delay variation due to device mismatch
While the effect of global variation, such as process and temperature, can be partially tuned out
with calibration, the effects of random device mismatch between individual delay cells remains.
Delay variation, including the rising- and falling-edge delay discrepancy, depends primarily on
the matching of the controlled current sources, M1 and M2, as well as the matching between
the drive strengths of devices M7 and M8, which comprise the feedback inverter. The effect of
random mismatch of the controlled-current sources can be modeled as percent variation εi =
∆IN(P)
IN(P)
in the charging current. The effect of the device mismatch in the feedback inverter can be modeled
as percent variation in the switching threshold, VT H , of the inverter, εv = ∆VT HVT H . To simplify the
analysis, the charging current can be approximated with its average value I and the threshold
crossing time can be calculated with eq. 5.6. The fractional variations in the threshold-crossing








assuming εi 1. The percent error in the charging current can be expressed as the percent error in
the threshold voltage εv,1(2) =
∆VT,1(2)
VT,1(2)
of M1 or M2 by linearizing about the nominal bias point: for
155
Figure 5.17: The average delay and the 1-σ boundaries of delay variation due to random local
mismatch for (a) rising-edge and (b) falling-edge inputs.
a nominal current value IN(P), the variation in the current ∆IN(P) due to a variation in the threshold







where RS, given in eq. 5.4, reduces the sensitivity of IN(P) to VT,1(2) variation. The fractional










It can be noted that VT variation of M1 and M2 causes a larger delay variation when the controlled
current sources are in weaker inversion because the gm/I ratios increase. Fig. 5.17 shows the
simulated average delays for rising- and falling-edge inputs and the 1-σ delay variation boundaries.
As the delay is decreased via the control word, the percent variation in delay decreases because
larger charging currents, corresponding to a stronger levels of inversion, are used.
156
5.2.3 Timing jitter of a delay cell
In analyzing the timing uncertainty in the delay of a CT delay cell, it is assumed that a delay event
is not in progress when a new data token arrives and that there is enough buffering time between
tokens such that the internal node voltages have settled. This ensures that each delay event is
independent of the delay history and the delay cell is memoryless. This study of CT delay jitter is
similar to the analysis of timing jitter in ring oscillators [48–51] and inverters [52, 53].
The sources of noise can be categorized as thermal noise due to internal devices, noise on the
bias voltages of the discharging transistors M1 and M2 in Fig. 5.9, and noise on the supplies. The
thermal noise of the devices internal to the CT delay cells, which is the dominant noise source, is
studied in this section. Noise on the bias voltages is a non-dominant effect because large bypass
capacitors are used.
The delay cell is composed of a current-starved inverter with positive feedback followed by an
output buffer composed of two inverters. The propagation delay of the first buffering inverter is
assumed to be approximately equal to that of the feedback inverter. The propagation delay of the
delay cell, TD, can be partitioned into three parts:
1. TT H is the time it takes the current-starved inverter’s output voltage VC to reach the threshold
of the feedback inverter VT H ≈VDD/2.
2. Tf is the propagation delay of the feedback inverter.
3. Tbu f is the propagation delay of the last buffering output inverter.
The delay TT H is made tunable via the currents of the current-starved transistors. The propaga-
tion delays of the feedback and output inverters are somewhat dependent on the slew-rate of the
157
current-starved inverter’s output voltage VC, however for the intended delay range, Tf and Tbu f can
be assumed to be approximately constant. The delay jitter of the feedback inverter and the out-
put inverter is likewise approximately constant. Accurately determining the delay variation of an
inverter composed of short-channel transistors and a non-constant input slew rate is a non-trivial
problem requiring the analysis of the jitter in each region of operation with input slope dependence;
this can be estimated by extending the result of [54] on the transient response of an inverter and
combining it with the analysis of [49] and with circuit simulation. In this work, the delay and jitter
of the feedback inverter and output inverter are obtained from simulation results of the delay cell
using post-layout extracted models. The goal of the following analysis is to determine the delay
jitter of the current-starved inverter.
For long delays where TT H  Tf + Tbu f , the jitter of the feedback and output inverters is a
small fraction of the total delay jitter because the feedback and output inverters, which experience
sharper output transitions, are sensitive to noise only during their transition intervals. The current-
starved inverter with a slowly varying output VC, in contrast, is susceptible to noise throughout the
charging cycle of the delay event, until positive feedback is triggered, and is, thus, responsible for
the majority of the delay jitter. For cases where TT H is comparable to Tf +Tbu f , the jitter of the
current-starved inverter is comparable to that of the feedback and output inverters.
The delay jitter of the current-starved inverter is due to the error in the time that it takes VC
to reach the threshold of the feedback inverter VT H ≈ VDD/2. While the threshold voltages at
which the two feedback devices (M5 and M6) begin to turn on typically differ from VDD/2, they
are sufficiently “on” to trigger positive feedback when Vf = VDD/2. The first part of the delay
operation, corresponding to the slow charging of the capacitor voltage VC, is studied using simple
but accurate models to approximate the standard deviation of the delay jitter, which is used to gain
158
insight into important design characteristics.
Figure 5.18: Dominant delay jitter mechanism composed of a noise current source and a dis-
charged capacitor.
The current-starved nMOS and pMOS transistors, which are enabled on opposite edges of the
input signal and can be mismatched in current and noise level, have the same effect on the delay
jitter. The current-starved delay mechanism for either input polarity can be modeled as a noisy
current source charging a fully discharged capacitor C until a threshold voltage VT H is reached, as
illustrated in Fig. 5.18. The effective load capacitance C is comprised of the drain capacitances of
the current-starved inverter, the gate capacitances of the feedback inverter and the first buffering
inverter, and also parasitic wire capacitances. The current I is assumed to be corrupted by additive
white Gaussian noise, in(t), with a power spectral density of N0/2. Since integration is a linear
operation, integrated zero-mean Gaussian noise also has a zero-mean Gaussian distribution. The
capacitor voltage can then be represented as a sum of a linearly time-varying component IC t and
the normally distributed component vn(t). The average time it takes the capacitor voltage to reach
the threshold is given by eq. 5.6 as T T H = CVT HI .
It is interesting to note that an integrated noisy current is equivalent to Brownian motion with a
drift [55, 56], where the Brownian motion is an integral of a normally distributed random variable
159
(e.g. in(t)) and the drift component accounts for the linearly time-varying mean of the signal (e.g.
the integration of a constant current I). The capacitor voltage threshold-crossing time is a well-
known problem in the study of stochastic processes, such as, in modeling fluctuations in the stock
market. This stopping time has a known distribution:





for a standard random process with a standard deviation of
√
t [56]. Finding the variance of TT H




2φTT H (t)dt is difficult to determine. Instead of directly calculating the delay variance,
an alternative approach is taken in the following section.
Delay variance approximation using the variance decomposition formula
A different approach to determining the delay variance is taken by considering the error in the
capacitor voltage at the average stopping time, TT H . For every possible motion path, the value of
the voltage at the average stopping time, VC(TT H), differs from the threshold value by an error,
VERR, which is a zero-mean Gaussian random variable with a variance σ2VERR . The variance of

























































According to the variance decomposition formula, also known as the law to total variance [57],
the variance of a random variable, Y , can be determined by conditioning with respect to another
variable X according to
Var{Y}= Var{E{Y |X}}+E{Var{Y |X}}. (5.13)
This formula can be used to determine the standard deviation of the delay jitter by conditioning
with respect to the normalized capacitor voltage error VERR. The variance of the delay, TT H , is
equivalent to the variance in the timing error, TERR = TT H − TT H , between the actual and the










A reasonable value of the delay jitter standard deviation, σTT HTT H < 1%, is assumed. The variance
of the delay jitter is equal to the first term in the equation and the second term in negligible.
161
Figure 5.19: Projected paths to the threshold voltage from several values of VERR; standard devia-
tion in path duration is negligible compared to the path duration.
Qualitatively, this can be justified with the following explanation, as is illustrated in Fig. 5.19.
For each value of VERR, the capacitor voltage with a slope S = IC will be expected to reach the
threshold VT H in the time E{TERR(VERR)} = VERR/S, which can have a positive or a negative
value, depending on the sign of the voltage error. This is equivalent to projecting the VC motion
path along a line with a slope S = IC for each VERR until the threshold is reached. It is assumed




 1 since σTT HTT H  1. The variance in the total delay time can be approximated
as the mean of the squared expected passage times, scaled by the probability of each VERR, which
is equivalent to the first term in eq. 5.14. The second term in the equation, then, accounts for the
deviation in each passage time TERR(VERR), which is negligible.
The standard deviation of the modeled delay cell can then be determined by rewriting the first
162
















The delay jitter increases with the square of the overall delay, which is in agreement with the results
of [50, 51] for open-loop ring oscillators.
Finite output impedance model for the delay mechanism
A more accurate model of the CT delay cell’s mechanism represents the charging current source
with a finite impedance Ro, as shown in Fig. 5.20. Instead of a linearly increasing motion path, the
capacitor voltage follows an RC-like settling trajectory until positive feedback is triggered, as is
seen in the delay-cell waveform in the figure. The expected RC-like settling trajectory, neglecting
the current noise is:
VC(t) = (VDD+ IRo)(1− e−t/RoC) (5.16)
and the expected stopping time is given in eq. 5.17.







Figure 5.20: Delay mechanism model with a finite output impedance and example signals.
The delay variation can again be approximated using the first term of the variance decompo-
sition formula, eq. 5.15, using the momentary slope of the output voltage at the stopping time,





The variance of the capacitor voltage at the expected stopping time can be derived following a
procedure similar to that used in the derivation of eq. 5.12, but instead using the impulse response











































As the output impedance approaches infinity, this result approaches the expression in eq. 5.12.
The delay jitter can then be determined as the standard deviation of VERR scaled by the momentary
slope S(TT H). The delay jitter increases as the slope is reduced. If a threshold is reached during
the early portion of the trajectory, where the slope is approximately linear, the delay variation is
comparable to that of Brownian motion. On the other hand, if the threshold is reached during
the slowly varying, nearly settled portion of the exponential trajectory, a large delay variation is
observed because it takes a longer time to correct a small error on the capacitor voltage due to
the low value of the slope. It follows that for a low-jitter design, the motion path must have a
high slope near the threshold, which can be accomplished by keeping the threshold well below the
trajectory’s asymptotic voltage.
Consider using a smaller output resistance in the delay mechanism while keeping the rest of
the parameters fixed, with the threshold set well below the trajectory’s voltage limit. Since Ro has
been decreased, the slope of the output voltage at the beginning of the charging cycle increases,
which causes the threshold to be reached faster, assuming the stopping time is less than the RoC
165
time constant. To correct this delay reduction, either C must be increased or I must be decreased.
The former correction comes at the cost of an increase in area and, more importantly, an increase in
energy dissipation, which is proportional to the capacitor value. The benefit is a modest reduction
in the delay jitter due to noise filtering. Alternatively, the current can be decreased; however, this
can likewise come at the cost of area and power if longer devices must be used in order to avoid
biasing devices in weak inversion, which leads to a higher sensitivity to noise on the bias voltages.
While the current-starved inverter’s noise power will decrease with the current, other noise sources
will not scale.
Delay-cell jitter calculation based on simplifying models
Figure 5.21: (a) Schematic diagram and (b) small-signal noise model for the delay cell’s half
circuit for a rising-edge input.
The current-starved nMOS transistor circuit in Fig. 5.21a is considered, corresponding to a
rising-edge input delay operation. The following analysis is also valid for the delay mechanism on
a falling-edge input. It is assumed that during the charging of the capacitor until positive feedback
is triggered (VC =VDD/2), the current-starved device remains in saturation. This assumption holds
in the range of bias conditions for the intended delay range. For short delays beyond the intended
range, the current-starved device may be in triode for the latter portion of the delay. A technique
166
in [49] can be used to study the operation in the two operating regions separately. The current-
starved pull-down network can be represented with a small-signal model (Fig. 5.21b) of a common-
source amplifier with source degeneration, RS, the expression for which is given in eq. 5.4. The
source resistance boosts the effective output impedance of the current source, Ro, which is given
in eq. 5.5. The source resistance also makes the current source less sensitive to noise on the
bias voltage. Transistor M1 and M3 contribute to the noise current in(t) (Fig. 5.21); their drain
noise currents are considered, with power spectral densities SI,d,1 = 2kT γgm1 and SI,d,3 = 2kT γgm3,
respectively, where the noise factor γ is typically 1 to 1.5 for submicron CMOS technologies. After
basic analysis of the circuit in Fig. 5.21b, the noise current density contributions of M1 and M3 to








, respectively. The total
























For large values of TT H , the overall delay-cell jitter is dominated by this delay variation, par-
ticularly as the delay is increased and the slope of VC at the threshold crossing decreases. For short
delays of under 100 ps, however, TT H is comparable to the delay through the feedback and output
inverter, and the variation of Tf and Tbu f due to noise is nonnegligible.
167
Simulation results of the delay-cell jitter
The propagation delay of the current-starved inverter, TT H , and the standard deviation of its
delay jitter, σTT H , for several values of the bias current are shown in Fig. 5.22 using post-layout-
extracted circuit simulation results (based on 100 samples each) and the calculated results of using
the expression derived in the previous section (eq. 5.21). The finite rise-time of the input signal
which drives the delay cell causes an additional delay of approximately 10 ps. The calculated
delay shown in Fig. 5.22 includes this additional delay: TˆT H = TT H + 10ps. The calculations
were carried out using the small-signal parameters obtained through simulation, and the estimated
effective capacitance C = 4.5 fF. The noise parameter γ is estimated as 1. Some of the discrepancy
between the approximated and simulated delay values is due to the error in the estimation of γ,
the errors in the approximation of the effective load capacitance and nonzero capacitance on the
source node corresponding to VS, which slows the turn-on time of the current source, as well as
other model limitations.
The calculated and simulated values of the delay and the delay jitter of the current starved
inverter match within 5% and 11%, respectively. Fig. 5.22 also shows the total delay and jitter of
the delay cell using circuit simulation. For long overall delay, which corresponds to low values of
the bias current, the delay cell jitter is dominated by the jitter of the current-starved inverter. As
the bias current is increased, the σTT H decreases, and σTD becomes nonnegligible.
The simulated jitter of the full delay cell is illustrated in Fig. 5.23 for a range of delay values.
The standard deviation of the delay jitter is withing 0.5% of the overall delay in the delay range of
80 ps to 325 ps.
168
Figure 5.22: A comparison of calculated and simulated values of the delay and the delay jitter
standard deviation for several bias-current values.
Figure 5.23: Simulated delay jitter vs. cell delay.
169
5.3 Charge-pump design
A bi-directional charge pump at each level and tap of the filter in Fig. 5.4 performs the functions
of a filter coefficient, decoder of per-edge representation, DAC and analog adder. Fig. 5.24 shows
the block diagram of the charge pump, composed of a multiplexer, pulse generators and a bi-
directional tunable current source. A positive charge pump is composed of a pulse generator with
an inverted output and a positive current source while a negative charge pump consists of the
other pulse generator and the current sink. Inverter inv2 is used to invert the pulse that enables the
current source. In order to correct for the extra propagation delay this inverter adds in the top signal
path, the delay of the bottom path is extended by adding inverter inv1 before the pulse generator.
Depending on the polarity of the coefficient, indicated by the signal SIGNk, the multiplexer routes
Rm,k to control the positive charge pump and Fm,k to control the negative charge pump, or visa
versa. The bi-directional current source provides the charge and the pulse generator controls the
duration over which the charge is supplied.
Example signals illustrate the operations of the charge pump and are shown in Fig. 5.24 for
a positive filter coefficient. Each edge of Rm,k triggers the pulse generator to produce a pulse
of a fixed duration, TPUL, which enables the positive current source. An output current pulse of
amplitude Ik and the duration TPUL is generated, which causes a charge of Qk = IkTPUL to be
accumulate on the total load capacitor, CL. The output current pulses need not have sharp edges
because the shape of the pulses is not important; rather, the amount of charge accumulated over the
pulse width window is what counts. The total load capacitance, CL includes the adder capacitor,
input capacitance that follows the processor in a system, the total capacitance of the charge pumps
at every level and taps, and also wire capacitances. The change in the capacitor charge causes
170
Figure 5.24: Block diagram of a bidirectional charge pump of the filter in Fig. 5.4.
the capacitor voltage, VOUT,k to be changed by Vk =
IkTPUL
CL
. VOUT,k is the contribution of the mth
level and kth tap to the total output voltage of the filter in Fig. 5.4. The falling-edge signal, Fm,k,
triggers the pulse generator that turns on the negative current source, which generates a negative
current pulse, an inverted version of the positive pulse. The negative charge pump removes the
accumulated charge Qk and causes an output voltage change of−Vk. VOUT,k is then a reconstructed
per-level signal with finite rise and fall time, whose amplitude, Vk, is scaled by the coefficient.
The analog voltage value, Vk, depends on the amplitude of the current pulse and its duration. It
is the current size that is made variable to set the coefficient, while the pulse width is maintained
fixed. It can be noted in the example signals that the output voltage changes not in ideal sharp
steps, but instead has a finite rise time, which is equal to the duration TPUL over which the current
is integrated onto the load capacitor. The finite rise-time of the reconstructed signals produces a
smoothing effect and causes high-frequency quantization distortion to be attenuated. If long pulse
widths were used, which exceeded the interval between consecutive tokens of the per-edge signals,
a negative current pulse generated by the second token would partly coincide with the positive cur-
171
rent pulse due to the first token, which would prevent the reconstructed per-level output, VOUT,k,
from reaching an amplitude of Vk. Instead of being composed of pulses with finite transition times,
the reconstructed per-level output would contain triangular segments of diminished amplitudes,
which would cause more output signal attenuation with increasing input frequency. If the coeffi-
cient size of the filter in Fig. 5.4 were varied by changing the duration of the current pulse instead
of by varying the current pulse amplitude, higher coefficient values, which must accommodate a
full coefficient tuning range, would cause more in-band attenuation. The weight of the coefficients
will therefore be frequency dependent, which is an undesirable effect.
Fig. 5.25 shows the output power of the fundamental component of a 3-bit ideally-quantized
signal reconstructed at a single tap of the filter in Fig. 5.4 using the charge-pump as a DAC for
several fixed values of the current-pulse duration TPUL. Ideally, the output power should be flat with
frequency, however, longer pulse widths cause a faster high-frequency roll-off. For a maximum
attenuation of 3 dB at 4.5 GHz, for example, the pulse width cannot exceed 100 ps. The pulse
width must also be less than the minimum spacing between per-edge tokens, which is equal to the
period of the maximum input frequency, to ensure that the request for a new pulse is not received
while the previous pulse is still being generated. This token spacing limit (200 ps for fMAX = 4.5
GHz) is less strict than the limit imposed by the frequency roll-off consideration.
The shortest pulse-width duration is limited by the narrowest pulse the pulse generator can
reliable produce. A full-rail pulse must be generated despite finite slew rates, and temperature and
processor variations. For 20-ps output rise and fall times, which is typical for digital gates with
non-negligible wire capacitance implemented in submicron CMOS technologies, the minimum
nominal pulse width is approximately 40 ps. It is desirable to maximize the pulse width to ensure
a reliable pulse and also to increase the maximum output voltage, while maintaining the in-band
172
Figure 5.25: Output power of the fundamental component of a 3-bit quantized signal reconstructed
as a single filter tap by using a the charge pump as a DAC for several values of the current pulse
duration.
roll-off within acceptable bounds.
It is important for the overall propagation delay from the input of the filter to the output of the
charge pump, shown in Fig. 5.4, to be well matched for rising- and falling-edge inputs in order to
prevent signal distortion, which was discussed in Sec. 4.2.1. Since signal information is encoded
in the transitions of the per-edge signals, filter blocks must perform the same operations for inputs
of either polarity, which can cause a difference in the propagation delays because an input of one
polarity might have to go through an extra inversion stage. Efforts were made to equalize the
propagation delays despite temperature and process variations. It is important to note the effects
of global temperature and process variations on the pulse width. Latency and the charge pump
currents are not detrimental to the response of the filter because these changes occur at every tap
and at every level. Longer pulse-generator pulses and higher charge pump currents at all taps
merely cause an increase in the filter gain at all frequencies without causing distortion.
173
Figure 5.26: Schematic diagram of the XOR-based pulse generator.
5.3.1 Pulse generator design
The pulse generator in Fig.5.24 is an either-edge-sensitive self-timed one-shot circuit, which can
be realized as an XOR operation of a per-edge signal Rm,k or Fm,k and its delayed version (the
per-edge signals are provided by the MUX). A delay-equalized XOR-based pulse generator was
realized in the first chip; an alternative pulse generator was developed for the second chip. Both
implementations are detailed in the following sections. The pulse generators were designed for
a pulse width in the range of 40 ps to 100ps, which is appropriate for signals under 4.5 GHz.
The pulse generators consume the largest fraction of the total dynamic power dissipation of the
gigahertz processor because their outputs undergo two full-rail transitions for each input token;
therefore, it is important to minimize their power consumption.
174
width (µm) width (µm)
WI,N 0.36 WI,P 0.48
WA,N 0.12 WA,P 0.72
WB,N 0.36 WB,P 0.12
WX,N 0.36 WX,P 0.48
Table 5.2: Transistor sizes of the XOR-based pulse generator. All transistors are of the minimum
length of 65 nm.
XOR-based pulse generator
The schematic of the XOR-based pulse generator is shown in Fig. 5.26. Inverters 1–3 are respon-
sible for the delay of the delayed path and, therefore, control the pulse width of the output signal.
Example signals are shown in Fig. 5.27. It is again assumed that the block has finished processing
the previous input and all internal node voltages have settled before a new token arrives. On a
rising-edge input, the rising edge of signal A causes the XOR gate to produce a logic-level low sig-
nal, which starts the output pulse. The falling edge of signal B, which is a delayed inverted version
of A, disables the pull-down network of the XOR and enables the pull-up network. This causes
the XOR output to rise to logic-level high, thereby completing the output pulse. For a falling-edge
input, the rising edge of A triggers the beginning of the pulse and the falling edge of B, which is
the delayed inverted version of A, leads to the completes of the pulse.
Transistor sizes of the pulse-generator implementation are listed in Table 5.2; inverters inv1,
inv2 and inv4 are minimum size inverters and inv3 is three times their size. The drive strength sym-
metry of the XOR gate is skewed; the nMOS transistors, which are responsible for the beginning
edge of the pulse, have boosted drive strengths in order to increase the sharpness of the first pulse
transition. If this beginning-edge pulse transition is slow, the delay of the delayed path may be
not long enough to allow the output pulse to reach a full-rail amplitude, and the integrity of the
175
Figure 5.27: Example signals of the XOR-based pulse generator.
176
pulse will be compromised. In order to ensure that a reliable pulse is generated despite unexpected
additional parasitic capacitance, temperature variation, and device mismatch, it is therefore better
to sharpen the beginning-edge pulse transition and allow the finishing-edge transition to become
less sharp. This is accomplished by increasing the sizes of the nMOS transistors and reducing the
sizes of the pMOS transistors in the XOR gate. The output of the XOR gate is a critical node;
unaccounted-for capacitances at this node will reduce the output-pulse sharpness. Inverter inv4
adds minimal input capacitance and is used to buffer the critical node.
The pMOS and nMOS transistors of inverters invI, invA and invB are designed to have unequal
drive strength in order to equalize the delay and output pulse width for rising- and falling-edge
inputs. If equal drive strengths were used instead, the delay between the start of the output pulse
and a falling-edge input, τF , will be longer than the delay for a rising-edge input, τR, because of
the propagation delay of inverter invA. The difference in delays of the rising edges of A and A
relative to the input results in a polarity-dependent delay discrepancy of an inverter’s propagation
delay, around 15 ps, which is undesirable as it will cause an increase in half-harmonic distortion.
The drive strength symmetry of invI and invA is therefore skewed in order to boost the propagation
delay for the falling edge of IN and slow down the delay for the rising-edge input of IN. The drive
strength symmetry of invB is likewise skewed, but is opposite to the skew of invA, to ensure that
the delay between the rising edge of A and the falling edge of B is equal to the delay between the
rising edge of A and the falling edge of B. This delay equality ensures that the output pulse width
is equal for both input-edge polarities. While the skewed delays of invA and invB cannot match
perfectly, matching within a few of picoseconds can be achieved because both gates drive a similar
load and have inputs designed to have similar slew rates.
Process variation reduces the quality of delay match between invA and invB. However, the
177
overall matching between the primary and delayed paths is less sensitive to process variation, as
will be qualitatively explained. Simulation results of the effects of process variation on pulse-width
matching are presented later in this section for justification. It is assumed that for a typical-typical
process, the delays are perfectly matched. Consider the case of a slow process corner for pMOS
transistors and a typical process corner for nMOS devices. Since the inverters in the main and
delayed path have similar nominal propagation delays, they will experience a similar increase in
the rising-input delay by ε, to first order. The delay from a rising edge of A to the falling edge of B
will be extended by ε due to the longer rising-input delay of inv2. Relative to the falling edge of A,
the delay of the rising edge of A will be extended by ε due to a longer rising-input delay of invA and
the delay of falling edge of B will be extended by 2ε due to the longer rising-input delays of inv1
and inv3. As a result, the relevant delay from A to B will be extended by 2ε−ε= ε, which matches
the extension in the relevant delay from A to B. This causes the same increase in the output pulse
width both input-edge polarities. This process-variation-insensitive delay equalization is made
possible by the inverters in the delayed path. The propagation delay from the input to the rising
edges of A and A, however, is not as well equalized with process variation because the increases in
the propagation delays of invI and invA are not as similar as in the case of the other inverters.
The delaying inverters, inv1 and inv2, were chosen to be of minimum size in order to reduce the
energy consumption of this power-hungry block. The increase in their propagation delays due to
their weak drive strengths is a desired effect. For a more efficient implementation, the delayed path
can be realized with a single inverter instead of three inverters; however, this results in insufficient
delay and a poor-integrity output pulse. Prolonging the delay by current-starving the delaying
inverter is a bad solution because signals B and B will have low slew rates, which will not only
cause crowbar currents to flow in the XOR gate, but will also cause slow transitions in the output
178
pulse. If, instead, one delaying inverter is removed, signal B will have to complete the pulse started
by signal A and signal B will have to complete the pulse started by signal A. This will result in
either an input-polarity dependent delay discrepancy or a pulse-width discrepancy. The delays will
also not track well with process variation. To maintain the same propagation delay, inverters invI
and invA will have to remain skewed. The delay skew caused by inverter invI will cause the falling
edge of B, which turns off the pulse started by A, to arrive too soon, which will cause a shorter
pulse width. If the skew of invI is reversed to extend the delay of the falling edge of B, this will
cause a discrepancy between τR and τF .
The average pulse width of the signal generated by the pulse generator, T = 0.5(TR+TF), the
average propagation delay of the pulse generator τ= 0.5(τR+τF), and also the polarity-dependent
discrepancies in the pulse delay, |τF − τR|, and in the pulse width, |TF −TR|, are listed in Table 5.3
for several process corners and operating temperatures. While both the delays and the pulse widths
change with process and temperature variations, the delay and pulse-width discrepancies are main-
tained within a tolerable range of several picoseconds. In the slow-slow process corner and at high
operating temperatures the pulse-generator pulse width is beyond the intended pulse-width range
of 40 ps to 100 ps, which leads to too much attenuation at higher frequencies. For the typical-
typical corner, the standard deviations of the propagation-delay and pulse width variation due to
jitter are 1.3% and 2% of their nominal values, respectively. Delay and pulse width variations due
to random local device mismatch have standard deviations of 2.9% and 5.8%, respectively. The
systematic delay and pulse width discrepancies are within the range of error due to random local
mismatch. The pulse generator dissipates 20.3 fJ of energy to generate a pulse on each input edge.
179
Process Temperature Pulse delay Pulse width
corner (C) Mean τ (ps) |τF − τR| (ps) Mean T (ps) |TF −TR| (ps)
TT 27 66.5 7.0 78.5 2.4
FF 0 47.5 5.1 57.1 2
SS 120 97.8 11.2 114.5 3
SF 27 65.5 10.2 76.8 3.5
FS 27 68.3 3.5 79.2 2.6
Table 5.3: Pulse-generator propagations delay, output pulse width and input-polarity-dependent
discrepancy in the delay and in the pulse width for several process corners and temperatures.
Variable-delay pulse generator
It is desirable to be able to adjust the pulse width of the pulse-generator output to correct for the
case of a slow process or high temperatures. The pulse width control also offers an additional
means of varying the global gain of the filter and attenuating higher frequencies in the case that
signals of interest are in the lower in-band frequency range. Altering the XOR-based pulse genera-
tor to have a variable delay is not an effective solution. Incorporating a variable-delay mechanism
into one of the delaying inverters inv1 or inv2 in Fig. 5.26 causes the overall delay to be too long,
making the pulse generator fail to meet the maximum-pulse-width requirement. Adding extra de-
lay to the primary path, signals A and A, is energy inefficient and causes a discrepancy in the pulse
delay or pulse width. If one of the inverters is removed to decrease the delay, the problems of
delay or pulse-width discrepancy, as described in the previous section, are worsened. Removing
two inverters from the delay path causes the problems of low slew rates. Therefore, an alternative
implementation of the pulse generator, realized in the second chip, was developed and is detailed
in this section.
An either-edge-sensitive pulse generator can be realized with a single-edge-sensitive one-shot
circuit by either (1) using control circuits to provide the one-shot with the correct triggering edge
180
Figure 5.28: Schematic diagrams of a NOR-based and a NAND-based one shot circuit and exam-
ple signals.
for either-input-edge polarity or (2) using two one-shots of opposite edge sensitivities in parallel
and combining their outputs. Both solutions are detailed in this section. A one-shot with a fixed
delay is considered first; the variable-delay structure is then later incorporated into the design. Two
common implementations of a one-shot circuit are based on a NOR gate and a NAND gate, shown
in Fig. 5.28; a positive pulse is generated by the NOR gate on each falling edge of the input A, and
a negative pulse is generated by the NAND gate on each rising edge of the input A. The delayed
path, which controls the pulse width, is realized with an odd number of inverts. A minimum
of three inverters is typically used because the delay of one inverter is not sufficient to produce
a reliable pulse. Current-starving the single inverter to increase its delay will cause a decrease
in the slew rate of its output signal and can cause undesirable crowbar currents to flow in the
NOR or NAND gate. The NAND one-shot implementation is preferred because the two-transistor
stack, which is responsible for the critical, beginning edge of output pulse, is composed of nMOS
transistors, whereas the NOR gate has a pMOS-transistor stack. The nMOS-transistor stack has
higher drive strength for the same drain capacitance when compared to a pMOS-transistor stack,
which improves the output pulse integrity or leads to lower energy dissipation. Both one-shots are
reset on the insensitive input edge.
181
Figure 5.29: Schematic diagram of the proposed one-shot circuit and example signals.
182
Transistor M1 M2 M3 M4 M5 M6 M7 M8 M9
Width (µm) 0.5 0.12 0.3 0.12 0.3 0.15 0.2 0.3 0.12
Table 5.4: Transistor sizes of the proposed one-shot circuit. All transistors are of the minimum
length of 65 nm.
A more efficient implementation of a single-edge-sensitive one-shot circuit is proposed here,
the schematic diagram of which is shown in Fig. 5.29. On a falling input edge, the one-shot
generates a negative pulse. The operation of the cell is illustrated in the figure. It is assumed that
all internal node voltages have settled after the previous input token. The input signal is initially
held at logic-level high and node voltages X and Y are at ground and VDD potential, respectively.
A reset device, M3, is disabled. The input switch formed by M1 and M2 is enabled, allowing the
output to track the input. As the input signal toggles low, the output signal follows it, thus initiating
the output pulse. The delaying inverter formed by transistors M4–M7 begins to switch slowly. The
inverter formed by M8–M9 contributes to the delay but has a sharp falling transition because it is
responsible for the critical first reset phase. As its output Y toggles, it disconnects the output from
the input, and turns on the resetting device, M3, which forces the output to VDD potential, thus
completing the output pulse. A second non-critical reset phase occurs when the input toggles back
to VDD potential, signals X and Y switch to their initial values, the reset device is disabled, and the
output is once again connected to the input through the switches. Since the input has now switched
to VDD, the output remains reset at VDD potential.
The output pulse width is controlled by the signal SLOW, which controls the drive strength
of the first delaying inverter, M4–M7, and thus the delayed-path latency, which sets the pulse
width. The pulse width has two possible setting. However, if M7 is biased with a variable gate
voltage instead of merely being enabled and disabled, the pulse width can be finely tuned. Example
183
device sizes of the one-shot circuit, implemented in the second chip, are listed in Table 5.4, with
all devices of the minimum length. It is critical to keep the capacitance at the input and output
nodes small to ensure sharp output transitions. The circuit that immediately precedes the one-shot
should have a larger nMOS transistor and smaller pMOS transistor in order to favor the critical
negative transition. Transistor M1 is responsible for the first edge of the output pulse, therefore
is it sized to have a large drive strength. Transistor M2 is used to keep the output voltage from
drooping from VDD potential after M3 is disabled; therefore, it is of the minimum size to minimize
its capacitance. M3 is sized to achieve a relatively sharp rising output transition while contributing
minimal capacitance to the output node. It is also important for the reset device to be quickly
enabled. For this reason, the drive strength of M8 is boosted and the capacitance at the node
of signal Y is minimized. The rising edge of Y is less critical; therefore, M9 is chosen to be
of the minimum width. It is not necessary to keep the capacitance at the node corresponding to
signal X at a minimum because it contributes beneficially to the delay of the first delaying inverter.
Additional capacitance, however, is not added unnecessarily as it will lead to an increase in energy
consumption. A longer delay is achieved more efficiently by current starving the pull-up network
of the first delaying inverter, the slowly-varying output of which is buffered by the second delaying
inverter. To achieve a longer pulse width, M7 is disabled by setting SLOW to VDD potential. The
falling edge of X is not critical; therefore, M4 is of the minimum size. The one-shot circuit can be
realized with a rising-edge sensitivity by interchanging pMOS and nMOS devices. However, the
presented realization is preferred because the critical transistors that need a high drives strength
are primarily nMOS, which can be smaller than the critical pMOS transistors in the alternative
implementation.
The delayed path of the one-shot is generated with a current-starved inverter and does not use
184
positive feedback, as in the case the thyristor-like-based delay cell. The positive feedback, which
is used in the delay cell to ensure fast settling, is unnecessary because the one-shot has sufficient
time to settle and is reset on the rising edge of the input. Incorporating the unnecessary feedback
structure would come at the cost of an increase in energy dissipation, which must be avoided.
Either-edge-sensitive pulse generator (1)
The single-edge-sensitive one-shot circuit can be used to realized an either-edge-sensitive pulse
generator as shown in Fig. 5.30. A self-timed multiplexer (mux) provides a falling edge (signal
VOS) to the falling-edge-sensitive one-shot of Fig. 5.29 on every input edge of either polarity. It
also provides a self-timed rising edge on each input edge in order to reset the one-shot circuit. The
operation of the pulse generator is illustrated in Fig. 5.30 with the help of example signals. It is
assumed that all internal node voltages have settled and the input is initially at ground potential.
The mux switches, which are controlled by signals SW and SWb, are enabled in a way that connects
the one-shot input to the inverted input signal via the top mux path. The input of the one-shot,
therefore, starts with a logic-level high value. On a rising edge of the input signal IN, the one-shot
input, VOS, follows the inverted input and triggers the one-shot. A reset delay block composed of
inv5, inv6 and inv7 is used to provide control signals which toggle the mux switches after the one-
shot pulse is complete in order to connect the one-shot input to the opposite mux path through inv4.
This causes VOS to toggle to logic-level high in order to reset the one-shot circuit, after which the
pulse generator is ready for the next token. The one-shot input now follows the noninverted input
signal. When the following falling input edge arrives, the one-shot is yet again triggered. After
the output pulse is complete, the reset block generates the reset signals, which toggle the mux
switches and reset the one-shot. The pulse generator actually generates two pulses on each input
185
Figure 5.30: Schematic diagram of a pulse generator based on a single-edge-sensitive one-shot
and a redirecting mux.
transition, one at the output and one at the input of the one-shot circuit. One would be tempted to
eliminate the one-shot circuit altogether and merely use VOS as the output. However, the width of
VOS pulses are not guaranteed to be equal, and the pulses can have different rise times because they
are generated by two different signals corresponding to the top and bottom mux paths. The one-
shot circuit guarantees the same pulse duration on every token. To ensure the same delay between
edges of IN and the output pulse, the top and bottom mux paths have to be delay equalized by
skewing inverters inv1, inv2 and inv3 appropriately.
The one-shot reset phase must occur in the time between consecutive input toggles. It is impor-
tant to ensure that the delay of the reset block is longer than the output pulse width, the propagation
delay of inv4 and the propagation delay of the top or bottom mux paths. Otherwise, a reset signal,
which is the rising edge of VOS, will arrive before the one-shot is finished generating the pulse
186
and the pulse will be corrupted. The delay of the reset block also cannot be longer than the time
between consecutive tokens because then new pulse requests will arrive before the pulse generator
is reset, which can prevent an output pulse from being generated. As the input frequency is in-
creased, it becomes more difficult to guarantee that the one-shot circuit is properly reset, and the
pulse generator can fail. This is made especially difficult by the wire capacitance at the output of
the mux switches, which causes a decrease in the signal slew rates.
For signals with input frequencies beyond about 5 GHz, there is not enough time to guarantee
that the one-shot circuits are properly reset, particularly as the temperature is increased or if the
process is slow, and as a result the pulse generators will fail. Consider a system with a frequency
band of interest below this limiting frequency but with blocker signals, for example in the 5.2-GHz
and 5.4-GHz frequency ranges, that are insufficiently attenuated by the circuits which precede the
filter. While the outer-most levels of a CT flash ADC in the system shown in Fig. 5.4 will fail to
trigger for such high blocker frequencies, mid-level comparators can still generate tokens. These
tokens, which cannot be adequately processed by the filter, will be distorted and will corrupt the
output signal. For this reason, this pulse generator architecture was not used in the chip implemen-
tation.
For gigahertz systems that are not susceptible to high-frequency blockers, this pulse generator
implementation can be effective. In such cases, the filter implementation can be made more effi-
cient by combining the delay tap and the pulse generators. Instead of using inverters inv5, inv6 and
inv7 to delay the input for reset signal generation, the output of the delay block of the next filter tap
can be used, which can lead to energy savings and a more compact implementation. This imple-
mentation, however, limits the tap delay range of the filter because it must satisfy the limitations
of the pulse-generator reset delay.
187
Either-edge-sensitive pulse generator (2)
An alternative implementation of an either-edge-sensitive pulse generator based on a single-edge-
sensitive one-shot circuit is shown in Fig. 5.31. This implementation bypasses the problem of
having to generate critical reset signals to reset the one-shot in the time between consecutive tokens.
This pulse-generator architecture was implemented in the second chip (all transistors of minimum
length of 65 nm, Wn1 = 0.5µm ,Wp1 = 0.12µm, Wn2 = 0.3µm, Wn2 = 0.4µm). Two single-edge-
sensitive one-shot circuits shown in Fig. 5.29 were triggered on opposite edges of corresponding
per-edge signals. The outputs of the one-shot circuits were combined with a NAND gate. Each
one-shot circuit generates a pulse on every other input token; therefore, it has more than sufficient
time for reset. Since the timing of the reset signals is not important, the transistors responsible for
the reset phase can be minimized. Each one-shot circuit is driven by a skewed inverter to sharpen
the output’s falling edge, which is responsible for the starting edge of the one-shot pulse. In order
to correct for the extra delay that inverter inv1 adds to the bottom path, this inverter, as well as the
output inverters in the charge-pump multiplexer, are skewed. Signal SLOW is used to select the
width of the output pulse. The propagation delay of the charge pump does not add to the delay
between taps (Fig. 5.4) but merely contributes to the latency in the processed signal and is not
limited to the minimum token spacing since it is composed of several granular delay contributors.
Circuit simulation results based on post-layout extracted models are presented for the pulse
generator driven by the charge-pump multiplexer. The two circuits consume a total of 27.5 fJ of
energy in processing a single token. The output pulse width can be set to 85 ps for SLOW =
VDD and 66 ps for SLOW = 0. The standard deviation of the pulse width due to noise for the
two pulse-width settings is 0.32 ps and 0.25 ps, respectively, which are obtained from transient
188
Figure 5.31: Schematic diagram of a pulse generator with a two pulse-width settings and examples
signals illustrating the narrow and wide output pulse options.
189
SLOW = 0 SLOW =VDD
Process Temp. Pulse delay Pulse width Pulse width
corner (C) Mean (ps) Dif. (ps) Mean (ps) Dif. (ps) Mean (ps) Dif. (ps)
TT 27 142 2.2 66.4 2.3 84.7 3.0
FF 0 100 3.4 46.7 1.6 59.1 2.0
SS 120 210 2.5 97.7 3.7 125.0 4.5
SF 27 140 6.6 63.8 1.8 81.0 2.2
FS 27 144 1.6 69.5 2.7 89.3 3.7
Table 5.5: Propagations delay and output pulse width (average values and difference in values for
rising- and falling-edge inputs) for several process corners and temperatures.
noise simulations. The jitter of the propagation delay has a standard deviation of 0.2 ps for both
settings. The propagation delay and pulse-width variations for the multiplexer and pulse generator
due to process and temperature variation are given in Table 5.5. The differences in delay and pulse
width for input edges of opposite polarity are within acceptable limits of a few picoseconds. For
the slow-slow process corner, the pulse width in the wide-pulse setting is 125 ps, which is too
long for signals beyond 4.5 GHz. This can be remedied by switching to a narrow pulse setting.
Alternatively, in a fast-fast processing corner, a narrow pulse width of 46.7 ps can be extended
to 59.1 ps, which will result in higher output signal amplitudes of the filter in Fig. 5.4. Similar
adjustments can be made in case of other process corners and operating temperatures. The standard
deviation of the pulse-width variation due to random local device mismatch (typical-typical corner)
is 4.5% and 4.0% for the narrow- and wide-pulse settings, respectively. The pulse delay’s standard
deviation is 4.4 ps.
5.3.2 Bidirectional voltage-controlled current-source design
A bi-directional tunable current source must satisfy several design criteria:
1. A tuning range of over three bits.
190
Figure 5.32: Binary-weighted negative current source.
2. A fast turn-on time.
3. No static power dissipation, with the exception of leakage.
Differential current-source implementations are not considered because such current sources
dissipate static power, which reduces the benefit of activity-dependent dynamic power dissipation
of CT processors. The current pulse must turn on fast enough to be able to be driven by a narrow
input pulse. In order to ensure the integrity of the input pulses, without using power-hungry buffers,
the input capacitance of the current source should be reasonably small and fairly constant. The
tunable current source can be realized as a sum of binary-weighted current sources enabled through
switches, as is shown in Fig. 5.32 for a negative current source. This implementation, however,
requires the pulse generator to drive the gate capacitance of an equivalent nMOS transistor with a
width of 7Wa, where Wa is at least the minimum allowable transistor width, which is a large load
for the pulse generators to drive and would require buffering. The wire capacitance added by the
complicated routing would increase the turn-on time and the energy consumption. A simpler yet
effective realization of the bidirectional current source is shown in Fig. 5.33a, with device sizes
given in Table 5.6. The current source and sink transistors MP2 and MN2 are enabled through
switches MP1 and MN1 by the pulse generators. The switches are placed at the source terminal
of the biased current sources rather than at the output node in order to prevent pulse feedthrough
191
Figure 5.33: Bi-directional current source and bias generating circuits for (b) the first chip and (c)
the second chip.
to the output and for source degeneration. The device sizes in the second chip implementation
were changed to increase the current range and to decrease the output capacitance to increase the
amplitude of the processor output.
The currents are made tunable via the bias voltages. The biases can be set by (1) mirroring the
reference current, IN,k and IP,k (Fig. 5.33b), as was done in the first chip implementation to allow
easy control of the coefficients and to enable debugging; (2) by using a binary-weight-current
DAC (Fig. 5.33c), as was done in the second chip; or (3) using a power-efficient voltage output
192
MN1 MN2 MP1 MP2
chip no. W (µm) L (µm) W (µm) L (µm) W (µm) L (µm) W (µm) L (µm)
1 0.25 0.065 0.5 0.1 0.5 0.065 1 0.1
2 0.5 0.065 0.5 0.065 1 0.065 1 0.065
Table 5.6: Transistor size of the bi-directional current source.
DAC to set the bias voltages directly. The maximum range of currents, defined by the maximum
and minimum achievable currents, is limited approximately to a four-bit range. Current values,
however, can be tuned with finer steps than the LSB corresponding to the maximum current range.
Numerous useful filter responses require a small coefficient range, but can benefit from higher
coefficient-value resolution. In the second bias circuit implementation, the currents are tuned with
six-bit resolution over a three-bit range. The linearity of the current range, however, limits the
coefficient resolution to a three- to four-bit range; six-bit coefficient resolution is useful for testing
purposes. The ratios of the maximum current to maximum current within the tuning range for
the two implementations are approximately 10 and 16, where the wider ranges is achieved in the
second chip implementation.
The output current pulses need not have sharp edges because the shape of the pulses is not
important, rather, the amount of charge accumulated over the pulse width window is what counts.
Finite transitions times in the output current pulses create a smoothing effect, which helps to at-
tenuate higher quantization harmonics, without causing notable in-band attenuation because the
transitions are shorter than the pulse duration, which is the dominant cause of coefficient-caused
high-frequency attenuation, as was discussed in the introduction of Sec. 5.1.
193
Figure 5.34: (a) Output current and (b) change in output voltage of the bi-directional charge pump
of chip 1 for several values of reference bias current.
5.3.3 Simulation results of the bi-directional charge pump
XNOR-based pulse generator charge pump
Simulated signals of the bi-directional charge pump of chip 1 (Fig. 5.26) are shown in Fig. 5.34
for a single Rm,k token followed by a Fm,k token, a 400 fF capacitive load, and varying bias cur-
rents. The charge pump dissipates 29 fJ of energy to process each input token. The same reference
bias currents are used to bias the positive and negative current sources. When a pulse generator
is triggered by an incoming token, its output pulse turns on the appropriate current source. When
the pulse is complete, however, the current source shown in Fig. 5.33 does not turn off completely
because when the switch devices MN1 and MP1 are turned off, the capacitances at their drain ter-
minals have to be discharged to turn off the current-controlled devices MN2 and MP2, respectively.
This post-turn-off discharging is responsible for the slight slope in the output voltage where the
pulse should theoretically be flat. This non-zero output slope occurs for both positive and negative
voltage excursions and has a slight low-pass effect on the filter’s frequency response.
The charge delivered by the positive and negative charge pumps in general does not match
194
exactly because the charges are supplied by nMOS-based and pMOS-based current sources, the
currents of which do not match exactly, and are turned on by negative and positive pulses, the
shapes of which do not match exactly, since one pulse is inverted. The charge discrepancy causes
discrepancies in the positive and negative voltage transitions of the filter output, as is seen in
Fig. 5.34b. Instead of an ideal voltage pulse with equal start and finish values, the finish value is
slightly different. As a corrective measure, a coarse coefficient-tuning scheme is implemented to
improve the positive and negative charge matching. The DC control block at the output of the filter
corrects the remaining mismatches in positive and negative charged and prevents the average value
of the output voltage from drifting.
For an example case of a 50 µA reference bias current, the charge pump causes 9.5 mV output
voltage excursion, with a 0.2 mV (2.1%) error between the positive and negative excursions. Noise
in the pulse generator and the bidirectional current source contribute an RMS noise voltage of 36
µV.
Charge pump with a variable-pulse-width pulse generator
Simulated output current pulses and output voltage excursions are shown in Fig. 5.35 for the second
charge pump implementation, realized with a variable-pulse-width pulse generator in Fig. 5.31. A
load capacitor of 400 fF is used. The charge-pump output-current amplitude, shown in (a) and (c)
for wide and narrow pulse settings, respectively, can be tuned within a range of 5 µA to 120 µA.
The amplitude of the current pulses is the same for both settings but the pulses in (a) are 84.7 ps
wide while those in (c) are 66.4 ps wide. The accumulated charge and the resulting voltage steps
are, therefore, 25% larger for the wide pulse setting (b) than in the case of the narrow pulse setting
(d).
195
For an example case of a 50 µA reference current, output voltage excursions of 8 mV and 10.1
mV are produced, with a 0.2 mV (2.5%) and a 0.05 mV (0.5%) difference between positive and
negative excursions, for the cases of narrow and wide output pulses, respectively. Positive and
negative excursions have better matching for wider turn-on pulses because when the pulse width
is lengthened, the output voltage discrepancy caused by a transition time mismatch of positive and
negative pulse-generator pulses makes up a smaller percentage of the total output excursion.
The RMS noise voltage at the output of the charge pump is 40.6 µV. The charge pump con-
sumes 30.8 fJ and 31.5 fJ of energy to process a single token, which corresponds to a 6% and
a 8.6% power consumption increase from the first charge-pump implementation, for narrow and
wide pulse settings, respectively. The benefits of a tunable pulse width, wider current range, and
smaller input-polarity-dependent discrepancies, shown in Table 5.3 and Table 5.5, are well worth
the small increase in the power consumption.
5.4 Variable adder-capacitor design
The total load capacitance that the processor in Fig. 5.4 must drive includes the input capacitance
of system blocks that follow the filter, input capacitance of the output buffer used for testing,
capacitance of the charge pumps at every level and tap, capacitance of the DC control block, and
also the parasitic wire capacitance of the filter grid structure. This capacitance is in the range of
150 fF to 250 fF. A linear load capacitor is also added to the output of the filter in order to allow for
additional output amplitude control and to reduce the effects of the non-linear drain capacitances
of the current sources in the charge pumps. For modest SNDR requirements, the distortion caused
by the non-linear capacitance is negligible compared to the error due to three-bit quantization.
196
Figure 5.35: (a,c) Output current and (b,d) change in output voltage of the bi-directional charge
pump of chip 2 for several values of reference bias current. (a,b) correspond to long pulses (SLOW
logic-level high) and (c-d) correspond to short pulses (SLOW logic-level low).
197
A variable capacitance can be realized with binary-weighted capacitors that are enabled through
nMOS transistor switches according to a control word. The switches add a resistance in series
with the load capacitors, which must be minimized in order to extend the RC bandwidth beyond
the gigahertz frequency range of interest. A transistor sized to have an adequately small effective
switch resistance has a nonnegligible source/drain capacitance, Csw, compared to the size of the
adder capacitor C to which it is connected. When the capacitor is disconnected, instead of a zero
capacitance, the effective capacitance of the capacitor and the disabled switch is Csw1+Csw/C ≈ Csw.
Instead of minimizing this turn-off capacitance by decreasing the switch size at the cost of higher
series resistance, the switch capacitance can be used to advantage, as is shown in Fig. 5.36 for
a two-bit capacitance range. The switch transistors M0 and M1 disable capacitors C0 and C1 by
disconnecting them from ground rather than from the filter’s output node so the source terminal
capacitances of the switches do not contribute to the total capacitance. Consider a single switch
and capacitor combination, such as M0 and C0. When the switch is enabled, the capacitor is
connected to ground through a small switch resistance. Therefore the effective capacitance is C0.
When the switch is disabled, the effective capacitance is the series combination of C0 and the drain
capacitance Cgd,0, which is sized to equal C/3, such that the effective capacitance can be switched
between C and C/4. Transistor M1 is sized to have a drain capacitance Cgd,1 equal to 2C/3. The
adder capacitor then has a linear range from 3C4 to 3C . A linear variable capacitance range is
not needed because CADDER is used merely for output amplitude control. This adder capacitor
implementation was used in the CT processors of both chip implementations. The transistors were
of the minimum length, for minimum series resistance, and had widths of 5 µm and 10 µm for
M0 and M1, respectively. The unit capacitance was set to C = 50fF for a capacitor tuning range
from 37 fF to 150 fF. Prior to enabling the filter for processing, the filter’s total output capacitance,
198
Figure 5.36: Variable adder capacitor implementation.
including the adder capacitor, is precharged to VDD/2 through a complementary switch enabled by
a RESET (RESET ) signal, as shown in the figure.
5.5 DC-control-block design
The negative and positive charges generated by a bidirectional charge pump cannot be expected
to match exactly since the positive and negative currents of the bidirectional current source can
be mismatched and also the positive and negative pulses of a pulse generator are not perfectly
symmetrical. As a result, the discrepancy between positive and negative charge-pump charges
will accumulate on the filter’s load capacitor, and cause the average value about which the output
swings to drift towards one of the supply voltages. For example, if the nMOS current source has
a higher current value or is enabled by the charge pump pulse for a longer time that the pMOS
current source, the output will tend towards ground potential. In reality, the output will not settle
at ground potential because as the average value of the output voltage drifts down from its mid-rail
value, the current of the nMOS will decrease and that of the pMOS will slightly increase due to
their finite output resistances. The drift in the output voltage is shown in Fig. 5.37. As a result,
when the positive and negative currents are mismatched, the output average will settle so some
equilibrium value such that the currents are equalized, and the output voltage will still have some
199
Figure 5.37: (a) Output current of a mismatched charge pump and (b) the drifting output voltage
of the charge pump for a 50% positive and negative current mismatch.
Figure 5.38: DC-control-block implementation with a differential sensing input.
room to swing. The reduction in the output swing is undesirable and inefficient, therefore a DC
control block is implemented to maintain the swing close to the optimum mid-rail value.
To maintain the output voltage centered about mid-rail, a DC control block must sense the
output voltage and feed back a correcting current, which will cancel the oversized charge pump
current. The DC control block’s correcting-current circuit must provide positive and negative
single-ended currents, therefore a single-ended implementation is a natural candidate. An example
implementation using a differential sensing circuit is illustrated in Fig. 5.38. The average value
of the filter’s output, generated by processing the output through a simple lowpass filter LPF, is
compared with the desired value of VDD/2. Correcting currents are generated with transistors Mp
200
and Mn. Static bias currents are dissipated by the correcting current sources and cancel each other
out in the ideal case of perfect charge-pump matching. The differential sensing circuit also requires
static bias currents. Instead of having a separate sensing circuit, the correcting current circuit can
serve as the sensing circuit as well. This realization is shown in Fig. 5.39a, which also includes
a disabling structure. The correcting currents and sensing circuit are realized with transistors M2
and M3 and a single-pole RC filter is used to obtain the average value of the current. For high-
frequency signal in the band of interest, which are attenuated by the the lowpass filter, the DC
control block acts merely as a large output resistor and a capacitor. For low-frequency signals,
the circuit is equivalent to a self-biased inverter because the output is shorted to the input through
switches SW1 and SW2. When the average output voltage is higher than the switching voltage of
this inverter, a negative current is generated as the difference between the drain currents of M2 and
M3 to cancel the excessive positive current from the charge pumps. The average output voltage
settles to the switching threshold of the inverter instead of to VDD/2, which is acceptable since the
exact average value is not important, as long as the DC values allows for a large-enough output
voltage swing. If the currents in the charge-pumps of the processor (Fig. 5.4) are mismatched,
resulting in a non-zero average output filter current I f ilter, the DC control block will maintain
the filtered output’s DC value within I f ilterGm volts of its zero current mismatch value, where Gm is
the sum of the transconductances of transistors M2 and M3. For the example implementation, if
constant input currents of −100µA to 100 µA are supplied, the DC control block maintains an
average output voltage within 37mV of the inverter’s switching voltage.
This inverter-based DC control block circuit can be modeled as two resistors connected be-
tween the supplies with the middle node connected to the filter output. If actual resistors were
used, they would have a similar averaging effect, except they would dissipate more static power;
201
Figure 5.39: (a) DC-control-block implementation with (b) a turn-on timing control circuit.
the biased inverters can offer the same correcting current strength but require approximately a tenth
of the static current of the resistors.
The response of the closed-loop DC control block is that of an underdamped system. The
stability of the circuit is verified with simulations. In order to speed up the transient settling of the
block when it is enabled, a turn-on routine is developed, the timing signals for which are generated
with the circuit in Fig. 5.39b. When the DC control block is disabled, the gate capacitance of M2
and M3 and also the capacitor in the lowpass filter are precharged to VDD/2. When the block is
enabled, the inverter’s output is disabled from VDD/2 and is instead connected to its own output in
order to precharge the input capacitance to have a voltage equal to its switching threshold. Since
the capacitance has already been charged to VDD/2, this charging is fast. After some delay to allow
for this precharging, the DC control block is connected to the filter output. Due to the precharging
of the capacitors, the settling response of the DC control block upon being enabled is faster.
202
Figure 5.40: Output voltage of a charge pump with 50% current mismatch. Output voltage drift
corrected by the DC control circuit.
The correcting effect of the DC control circuit is illustrated in Fig. 5.40. Without the DC
control block, the average value quickly begins to drift and after 50 ns settles to an average value
of 200 mV. With the DC control block enable, as the output voltage begins drifting towards ground
potential, the DC control block begins to generate a correcting current to cancel the charge pump
mismatch. Within a few nanoseconds, the output voltage average value settles. The implemented
DC control block dissipates an average power of 0.27 mW.
For a fast DC-control-block reaction time, the lowpass filter can be removed. The DC control
block will then operate as a highpass filter because while the response of the self-biased inverter is
lowpass, the filter’s output current increases linearly with frequency. (Consider removing the DC
control block. In order for the load capacitor voltage to be constant for input frequencies within
the band of interest, the RMS current, which is integrated on the load capacitor, must increase with
frequency.) The highpass DC control block has a −3 dB bandwidth equal to Gm2piC f ilter , where C f ilter
is the total load capacitance at the output of the filter and Gm is the transconductance of the inverter.
The cut-off frequency can be varied by changing the size of devices M2 and M3 and varying the
203
adder capacitance.
5.6 CT gigahertz digital processor implementation
The processor in Fig. 5.4 was realized as a part of a larger system for a possible UWB receiver,
which requires a low dynamic range. To place the CT DSP in context, the example system and
application are described. The full system consists of a variable-gain wideband front-end block and
a clockless flash ADC before the processor, which supply the gigahertz per-edge-encoded signals.
The output of the processor is connected to an energy detector as part of the receiver chain; the
output is also connected to a buffer to enable testing of the CT DSP. The CT digital processor in
this system is intended to provide a tunable frequency response that can be actively adapted to
changes in the environment and signal. Filtering is moved from the front-end analog domain to the
CT digital domain to allow for wider programmability and a very compact, inductorless system
implementation but without aliasing due to CT operation. In particular, the pulse-radio system is
intended to operate in the lower gigahertz UWB channels below 4.7 GHz, but is designed to be
robust against undesirable blockers in the 2.4-GHz and the 5.2-GHz frequency ranges. In the case
that a strong blocker signal corrupts the received signal, the CT FIR filter can be configured to
maximally attenuate the undesirable signal.
5.6.1 Speed and resolution limitations of a gigahertz CT digital processing
system
The speed of the system, and therefore the operating frequencies of the processor, are limited by
the speed of the ADC and the finite bandwidth of the front-end gain blocks, which are necessary to
204
provide the gigahertz digital signal. The resolution of the system is likewise limited by the ADC,
to first order, because the ADC cannot generate extremely narrow pulses. To explain these limita-
tions, the ADC, which was designed by Colin Weltin-Wu, is briefly described. The clockless flash
ADC is realized with latch-less single-ended comparators which can be thought of buffered invert-
ers. The input signal is provided through input coupling capacitors, which are precharged to the
threshold voltages. The input AC coupling in the ADC, as well as in the front-end gain block, lim-
its the lowest input frequency. Increasing the resolution in a CT system is equivalent to increasing
the frequency because higher timing resolution is required; both the resolution and the maximum
frequency are limited by the ADC. The resolution and speed limitations are most prominent at the
outermost quantization levels of the ADC shown in Fig. 5.4 when the peaks of the signal surpass
the threshold by a small voltage, which is at most equal to the voltage of a quantization step, and
for a short duration, equal to a small fraction of the input period. As the resolution and input
frequency are increased, the overdrive voltage and triggering time can become insufficient and the
comparators then fail to produce a pulse output. The effective resolution of the system is further
limited by the ADC due to signal-dependent output slew rates of the comparators, which affect the
propagation time. The comparator’s propagation time increases as the corresponding quantization
level is moved away from the mid-rail value because the magnitude of the input’s slope at the
threshold is lower than at the mid-rail threshold. The comparator’s propagation delay can also de-
pend on the direction in which the input crosses a quantization threshold. These signal-dependent
delay variations cause an increase in harmonic distortion and limit the output SNDR and, thus, the
effective resolution.
While a gigahertz-range CT digital system is limited in speed and resolution by the ADC, the
processor also has limiting effects. In order to discuss the limitations of the processor shown in
205
Fig. 5.4, it will be assumed that ideally-quantized and per-edge-encoded signals are supplied at its
input. The maximum input frequency is limited by the maximum granular propagation delay of the
processing blocks. To clarify, a block composed of numerous digital gates may have a long overall
propagation delay; however, the granular delay is the maximum propagation delay of the individual
gates that comprise it. The processor blocks that have the longest granular delays are the delay cell
and the pulse generator. The delay cell cannot accept any new tokens until the delay operation
is complete and the pulse generator cannot accept new tokens until the output pulse is complete.
In addition to the time it takes these blocks to process the input, additional reset time is required
after delaying or pulse generation operations are complete in order to reset the circuits and have all
node voltages settle to their final values. Since the pulse generator’s pulse width is no longer than
approximately 100 ps, as was described in Sec. 5.3.1, and the tap delay is typically over 100 ps for
signals of interest under 5 GHz, the delay-cell speed limitation is dominant. The granular delay
of the delay cell in Fig. 5.9 is the propagation delay of the current-starved inverter with positive
feedback in Sec. 5.2.3, TT H , which is given in eq. 5.8. The additional reset time needed to ensure
proper settling is the time it takes the positive feedback to toggle the current-starved inverter. For
the implementation presented, this time is approximately equal to the propagation delay of the two-
inverter buffer at the output of the current-starved inverter. As a result, the minimum allowed time
between tokens is equal to the tap delay, TD, and the maximum input frequency cannot exceed
1
TD
. As was explained in the beginning of this chapter, for bandpass gigahertz applications, the
tap delay in a per-edge CT DSP is typically chosen to be around 12 fC , where fC is the band center
frequency. The filter, in that case, can process inputs at frequencies up to twice the center frequency
without causing significant in-band error. If the input frequency is increased beyond the limit of
1/TD ≈ 2 fC, significant half-harmonics of the input are generated, which can fall in-band.
206
The relevant performance metric for a CT digital system is in-band SNDR, which is a func-
tion of the number of quantization levels used to represent the signals. For this discussion, it is
assumed that the input frequencies cannot exceed 1/TD, so that the processor can keep up with the
token rate. The SNDR of the processor is then affected by thermal noise, delay and coefficient
mismatch, and input-polarity-dependent delay discrepancies. Energy-efficient implementations of
the processor, in which transistor sizes are minimized wherever possible, are most sensitive to
errors due to mismatch and delay discrepancies. These errors limit the processor resolution to a
few bits and the achievable SNDR is around 20 dB. Jitter of the delay cell and pulse generator
and also the noise of the bi-directional current sources are well below the levels needed for 20
dB performance. Mismatches in tap delays and coefficients are static errors, which result in an
increase in signal distortion. The discrepancy in propagating delay of rising- and falling-edge to-
kens causes half-harmonic distortion. Delay equalization can be used to reduce the discrepancy,
but this is made more difficult by process variation. The achievable SNDR is therefore somewhat
limited by the technology because of device mismatch and process variation. Polarity-dependent
delay discrepancy is a fundamental limit of CT digital processors that use edges of both polarities
to encode signals, as in the case of a per-level or a per-edge processor. Even with sophisticated
delay equalization techniques and improved mismatch and process variation properties, even-order
(per-level and per-edge DSPs) and half-harmonic (per-edge DSP) distortion cannot be eliminated
in a single-ended processor implementation because the propagation delays and the shape of rising
and falling transitions cannot be made exactly equal.
207
5.6.2 Simulated CT digital processor outputs
Fig. 5.41 shows simulated output signals of the implemented gigahertz CT digital FIR filter for
single tone inputs for a bandpass and a notch filter configurations. The bandpass response shown
in (a) is first considered. A bandpass filter response is realized with a seven-tap filter configuration
and a tap delay of 145 ps, which places the center frequency at 3.45 GHz. The ADC and filter are
enabled at time t = 0 and an input is applied after 500 ps. 900 ps after the input is applied, the
signal reaches the last tap and the output signal settles to a steady-state response. The maximum
output signal occurs for a 3.5-GHz input signal, which is in the filter’s passband. Signals with
input frequencies of 2.5 GHz and 4 GHz are attenuated by over 15 dB. For a 4.5-GHz signal, the
input frequency is close to a notch in the frequency response, therefore the 4.5 GHz component is
significantly attenuated, while the second harmonic of the signals, introduced by three-bit quanti-
zation, is less attenuated and is noticeable in the quick waves of the blue signal. In the time interval
before the signal has propagated sufficiently through the filter taps, only the 4-GHz component of
the blue signal is noticeable to the eye. As time is increased, the 4-GHz component begins to
disappear because it starts being attenuated by the processor and only the harmonics remain.
Fig. 5.41b shows the output signals of the processor configured as a three-tap notch filter with
a tap delay of 120 ps. The first and last coefficients are equal and the middle coefficient is set
to zero. The notch frequency then occurs at 12(2TD) = 2.1GHz and the maximum gain occurs at
4.2 GHz. A 2.5-GHz signal, which is close to the notch frequency, is significantly attenuated as
compared to the signals close to the maximum gain frequency. The steady-state response of the
three-tap filter is reached noticeably faster than in the bandpass case because the signals have to
propagate through only three filter taps. The maximum output swing is smaller because only two
208
taps contribute charge as apposed to all seven taps, as in the bandpass case.
5.7 Filter tuning techniques
The tap delays and filter coefficient values of a gigahertz CT digital processor are tuned in order
to achieve the desired frequency response. It is not possible to measure gigahertz digital signals
directly because several harmonics of the fundamental frequency of the digital signal are needed to
represent digital transitions and a broadband impedance match to accommodate a few harmonics
is not feasible. The capacitance of the chip pads and package pins also makes it very difficult to
bring digital gigahertz signals out of the chip. Filter tuning, therefore, has to be done indirectly.
Schemes to tune tap delay and filter coefficient are presented in the following sections.
5.7.1 Delay-cell tuning scheme
Two delay-tap tuning schemes are used to tune the filter: (1) based on a notch frequency, and (2)
using a reference clock signal. In the first tuning scheme, the filter is configured as a notch filter
by enabling two tap coefficients, which are programmed to have the same value. A notch in the
frequency response occurs at a frequency corresponding to half the inverse of the delay, fnotch =
1
2TD
. For manual delay tuning, the frequency response can be measured, the notch frequency can
be identified, and the tap delay can be adjusted accordingly, as was done for chip measurements.
The delay can also be calibrated relative to a known time reference, for example a low-frequency
clock, which is typically available in a system for baseband processing. It is not practical to cali-
brate each delay cell individually because the added wiring complexity will make the filter structure
less compact, requiring longer interconnections, which are detrimental to the speed and isolation
209
Figure 5.41: Simulated filter output voltage for single-tone inputs processed through (a) a 7-tap
bandpass filter and (b) a 3-tap notch filter (with the middle coefficient set to equal zero).
210
Figure 5.42: Delay-cell tuning scheme block diagram.
of the processor blocks. All the delay cells in each per-level filter can be tested at once, as is done
in the first chip implementation, or using replica delay cells, as is done in the second chip. To tune
the delay of each per-level filter, the delay cell can be connected in a ring-oscillator configuration
merely by connecting the output of the last delay cell to the input of the first delay cell through an
inverting buffer that buffers the signals, which must travel through a wire that spans the full length
of the filter, and also to provide the correct inverted phase in order to enable oscillations. The tap
delay can be determined by measuring the ring-oscillation frequency relative to a reference clock
frequency. The tuning scheme, which is illustrated in Fig. 5.42, works as follows:
1. Signal ENRING toggles high to indicate that the delay cells of the nth per-edge filter are to be
tuned.
• All per-edge filters other than the one under test are disabled, and the processor is
disconnected from the input.
• The delay cells in the nth per-edge filter are preset so that their outputs alternate be-
211
tween logic-level high and low. The input of the first delay cell is connected to ground
potential, which is opposite of the preset value of the last delay-cell output.
• The output of the last delay cell is connected to a ten-bit binary counter. The ten-bit
delay counter and another five-bit clock counter are in reset.
2. On the following falling edge of a reference clock signal CLK, the tuning operation starts.
• The switch that connects the output of the last delay cell to the input of the first delay
cell through an inverting buffer is enabled. A ring oscillator configuration is formed
and oscillations start.
• The ten-bit counter, connected to the ring oscillator, is enabled by EN and starts count-
ing the number of ring-oscillator cycles.
• The five-bit counter, connected to CLK, is enabled by EN and starts counting the num-
ber of clock cycles.
3. When the clock counter reaches a threshold value NCLK , a control block causes a DONE
signal to toggle high.
• The current value of the ten-bit delay counter, NRING, is latched for later digital readout.
• The enable signal EN toggles to low, disabling the ring oscillator feedback switch,
stopping the oscillations.
• The delay cells and the binary counters are disabled.
The period of the oscillation frequency of the delay-cell ring oscillator is equal to TRING =
6TD,R+6TD,F +2TBUF = 12TD+2TBUF , where TD,R and TD,F are cell delay for rising- and falling-
212
edge input, TD is their average and TBUF is the propagation delay of the buffer. After NCLK clock
















The actual average delay TD is in the range of NCLKfCLK(NRING+1) to
NCLK
fCLKNRING
; the estimate is then within
± 1NRING × 100% of the actual average delay value. The 10-bit counter bit length was chosen to
allow large values of NRING for measurement accuracy within 2 ps. The 5-bit counter bit length
was chosen to allow for a variable tuning duration set by the clock.
Only the average tap delay can be approximated with this tuning scheme; TD,R and TD,F can
be estimated using a duty-cycle tuning scheme, which will now be described. This scheme is
based on twelve replica delay cells, which are connected in a ring oscillator configuration, how-
ever instead of direct connections, the delay cells are interconnected with an inverter. An ad-
ditional inverter is added between the first and the last delay cells for proper loop phase. All
delay-cell output are preset to logic-level high such that the input voltage of every delay cell ex-
cept the last one is the inverse of its preset value. The period of the oscillation frequency is then
TRING = 12(TD,R + TINV )+ 12(TD,F + TINV )+ 2TINV , where TINV is the propagation delay of an
inverter. The propagation delay mismatch for a rising- and falling-edge input of the inverter is
assumed to be much smaller than that of the delay cell. Due to the inverters inserted between each
213
delay cell, the positive pulse duration of the ring oscillator period consists only of rising-edge de-
lays, TRING,P = 12(TD,R+TINV,F)+TINV,R, and the negative pulse duration consists of falling-edge
delays, TRING,N = 12(TD,F +TINV,R)+TINV,F . A reference ring oscillator is also constructed; its
frequency of oscillation fREF is at least ten times the frequency of the delay-cell ring oscillator,
but the exact value is not important. This high frequency clock will be referred to as the reference
clock and the 12-delay ring oscillator will be referred to as the ring oscillator. The purpose of
the high-frequency clock is to measure the lengths of the ring oscillator pulses in granular steps.
The duty-cycle tuning scheme is illustrated in Fig. 5.43, and is similar to the delay tuning scheme.
Prior to starting the tuning scheme, an eight-bit binary counter for the reference clock and two
ten-bit binary counters, one for positive and one for negative ring oscillator pulses, are reset. When
the duty cycle tuning scheme is enabled, the feedback loops of the ring oscillator and reference
clock are connected and the oscillators are enabled. The eight-bit counter starts counting cycles of
the reference clock. The positive-pulse counter counts the number of reference clock cycles in a
positive ring oscillator pulse and the negative-pulse counter counts the number of reference cycles
in a negative ring oscillator pulse. When the reference clock counter reaches a threshold values,
the current values of the other two counters , NP for positive pulses and NN for negative pulses, are
latched. Since the precise reference clock frequency is not known, the exact pulse widths cannot be
estimated from the counter values. Instead, the ratio of the rising-edge to falling-edge delay can be




If the positive pulse count is higher, then the rising-edge delay should be increased and the falling-
edge delay should be decreased (manually or by using a self-calibration scheme) until the measured
duty cycles is close to 50%.
214
Figure 5.43: Delay-cell duty-cycle tuning-scheme block diagram.
215
5.7.2 Filter-coefficient tuning scheme
The coefficient values can be set without tuning according to the predetermined controls. For ex-
ample, simulation results can be used to determine the necessary bias current for each coefficient
value within the three-bit accuracy. The exact values of the coefficients are not important (scaling
all coefficients leads merely to a gain error), rather the ratios of the coefficients matter because. The
coefficient controls can be calibrated by measuring the output power for a single-tone input, pro-
cessed through a single-tap filter, which merely reconstructs the input. The filter coefficient tuning
scheme described in this section is used to reduce the mismatches between positive and negative
charges of the bidirectional charge pumps. Direct measurements or estimates of the coefficient val-
ues are not possible because they would require a very-high-speed ADC, which is prohibitive, and
the voltage excursion or the current pulses of the charge pumps cannot be measured by bringing
them out of the chip. Additional calibration circuits, such a high-speed ADC, would also contribute
significant capacitance to the output of the filter, where the coefficients have to be measured. This
additional capacitance reduces the output swing and would require higher current values, large
transistors, and buffers to drive the larger switch transistors, leading to a significant power dissipa-
tion increase just to incorporate a tuning scheme. In order to avoid significant hardware overhead,
a different approach is taken. It is assumed that all the positive charge pumps at all taps are set to
the correct coefficient values; simulation results or the aforementioned calibrations can be used to
realize this. The negative charge pumps are then tuned relative to the positive charge pumps. (This
is preferred over the reverse because positive charge pumps have better matching properties since
they are larger.) The tuning scheme calibrates all the charge pumps at a single tap and makes use
of the existing filter structure and output capacitance. The scheme is illustrated in Fig. 5.44 and
216
works as follows:
1. Prior to enabling the tuning scheme, the filter is disconnected from the input and instead
connected to ground potential. To test the kth tap coefficient matching, delay cells beyond
the tap under test are disabled. All coefficients but the one under tuning are turned off. The
DC control block is disabled and the load capacitance is precharged to VDD/2. An eight-bit
binary clock counter is reset.
2. When the tuning scheme is enabled, a ring oscillator clock composed of three to five replica
delay cells is enabled. The exact oscillation frequency is not important, but it is designed
to be 1 GHz to 4 GHz. The eight-bit counter starts counting clock cycles. The filter output
is connected to the ring oscillator output. On each edge of the clock, the per-edge signals
at every level toggle. This causes the charge pumps at the kth tap and all levels to trigger.
Theoretically, if the positive and negative charge pumps are perfectly matched, the net pos-
itive charge from all the levels and the net negative charge from all the levels cancel, and
the filter’s output voltage remains at VDD/2. If there is any charge pump mismatch, as will
always be the case, the output voltage will drift towards VDD or ground. Even with close
coefficient matching, the output voltage will eventually drift after a large number of input
edges. A comparator, realized as a chain of inverters, is connected to the filter’s output in
order to resolve the output to two binary values and to determine the direction of the drift.
3. After 256 clock cycles, corresponding to 512 input edges, the comparator output is latched.
The ring oscillator is disabled.
If the comparator output is high, the negative charge pump coefficient should be increased. The
tuning scheme is triggered again and the appropriate corrections are made until the comparator
217
Figure 5.44: Coefficient-mismatch tuning-scheme block diagram for the 1st tap (k = 1).
output toggles. The algorithms should continue until a single LSB change in the negative charge
pump (or some granular change in the negative bias current) makes the comparator output toggle.




This chapter presents the measurement results for the gigahertz CT digital FIR filter implemented
as part of a larger receiver system. The system was fabricated in ST’s 65nm CMOS technology
and uses a 1.2-V supply voltage. The results presented in this section are based primarily on
the first chip. The second chip exhibited oscillation problems in the single-ended input driver.
Solutions to the oscillation problem involved reducing the gain and bandwidth of the input driver,
which limited the frequency testing range of the processor. Due to the testing limitations of the
second chip, measurement results from the first chip are provided, unless stated otherwise. The
chip photograph of the first implementation is shown in Fig. 6.1, where the processor is seen to
occupy a small area of 0.073mm2. The grid structure of the processor is prominent, showing 14
per-level processors and seven taps.
219
Figure 6.1: Chip micrograph. (a) Overall view, (b) detail of the CT digital processor.
220
Figure 6.2: PC board and FPGA.
6.1 Measurement setup
The designed integrated circuit is packaged as a 64-pin QFN chip, which is mounted on a PC board,
as is shown in Fig. 6.2. Input and output RF signals are connected through on-board transmission
lines, which are capacitively coupled to side-mounted SMA connectors. Digital control signals
that are used to program the chip and to read out measured data for the tuning scheme are provided
by an FPGA, also shown. In this first prototype implementation, the coefficient bias reference
currents are provided for each tap by variable resistors. The delay cell is also biased with an
external reference bias current.
The gigahertz digital processor was tested by providing an analog signal and measuring the
buffered analog output since digital signals at gigahertz frequencies cannot be measured directly.
221
Figure 6.3: Measurement setup.
The measurement setup is shown in Fig. 6.3. Per-edge-encoded digital signals are provided by
a three-bit CT ADC. The ADC input is provided by the front-end block of the receiver, which
for the rest of the discussion will be referred to as a variable-gain input driver. The output of
the filter is measured through an output buffer. Both the input driver and output buffer provide a
50Ω broadband impedance match. The input driver offers a variable gain, with a maximum gain
of 40 dB. According to simulation results, the output buffer has 5-dB loss with a 1.5-dB in-band
attenuation up to 4.5 GHz.
The input capacitors of the ADC are periodically refreshed to restore the threshold voltage, as
explained in Sec. 5.6. This reset occurs every few hundred nanoseconds to several microseconds
and lasts for several nanoseconds. During the reset phase, the ADC is disabled and consequently
so is the processor, which causes a periodic dead time in the measured output signal. For a sinu-
soidal input, the measured output signal contains not only a tone at the input frequency and the
distortion harmonics, but also a skirt of single-tone components which are offset from the input
tone by multiples of the reset signal frequency and decrease in power as they move away from the
fundamental tone. The powers of the skirt tones that are close to the fundamental frequency are




The speed of the system is limited by the speed of the CT ADC at higher frequencies, as was dis-
cussed in the previous chapter, by input AC coupling between stages at lower frequencies, and also
by the bandwidth of the input driver and output buffer. Measurement results are presented directly
without post-processing or correction, unless otherwise stated. It is not possible to measure the
responses of the driver and buffer circuit to remove their effects because the extra capacitance that
would be added to make the blocks testable would significantly decrease the usable bandwidth,
prohibiting broadband operation. As a result, only the response of the full processing chain, in-
cluding the input driver, ADC, processor and output driver is known. An attempt is made to deduce
the response of the front-end gain from the measured responses with variable system parameters.
Measurement results are presented directly, without normalizing the output power. A full-scale
input power refers to the input power level at which a full-scale ADC input is reached at 1 GHz by
using the maximum input driver gain; this occurs at an input power of −41 dBm. The presented
frequency responses were measured with a vector network analyzer (VNA) and the output spectra
were measured with a spectrum analyzer. Frequency responses will be presented as measured
output power for a 50-Ω load; these are direct power measurements of the buffer output for a full-
scale input power, unless otherwise stated. The maximum usable system bandwidth is 0.8 GHz
to 3.2 GHz, as will be explained in the following sections. The signal band of interest will be
assumed to be limited to this bandwidth.
223
Figure 6.4: (a) Transfer characteristic of the system configured as a single-tap ADC-DAC and (b)
harmonic distortion ratio for a 1-GHz input signal.
6.2.1 Transfer characteristic of the CT ADC
Prior to studying the measured responses of various filter configurations, the system is character-
ized based on a single-tap filter configuration. The processor, in this case, acts simply as a DAC,
therefore this configuration will be referred to as an ADC-DAC. Measuring the response of the
driver and ADC establishes a baseline for comparison, and allows one to draw conclusions about
the bandwidth and the distortion characteristic of the blocks that precede and the processor, without
complicating the effect of the processor. The measurements of this baseline test, however, include
the effects of the processor coefficients and output driver, which can add to the distortion and noise
of the output signal.
The transfer characteristic of the ADC-DAC, including the effect of the input driver and out-
put buffer, is shown in Fig. 6.4a for normalized input and output amplitudes. Output power of
the fundamental frequency is measured for a 1-GHz input tone. The measured characteristic is
comparable to the transfer characteristic of an ideal ADC, also shown. The ideal CT ADC re-
sponse is not a straight line because the output is the amplitude of the fundamental component of
224
a quantized sinusoid; the contributions due to harmonics are not included. The measured response
appears smoother than the ideal ADC responses because the asynchronous comparators are limited
by finite gain and speed. The second-order and third-order harmonic distortion ratios at the output
are shown in Fig. 6.4b for a input power range normalized to the full-scale input power (0 dBFS).
Within a 20 dB input signal range (corresponding to a three-bit range) the zero-order system has
a minimum spurious-free dynamic range (SFDR) of 25 dB, limited by second-order distortion,
which is the dominant distortion component. The third harmonic is somewhat attenuated because
of the lowpass response of the filter’s charge pump , the attenuation due to the distributed RC struc-
ture of the filter grid and the bandwidth of the output buffer. As the input amplitude is increased
beyond the maximum range, a region shown in gray, the output amplitude approaches the maxi-
mum, and the output signal approaches a square wave. The quantization harmonics of the signal
likewise start to increase, since the ADC becomes effectively a 2-level quantizer.
The transfer characteristic of the system for a 1-tap filter configuration is shown in Fig. 6.5
for several input frequencies. A full-scale ADC input corresponds to a 0dBFS normalized input
power at 1 GHz. The transfer characteristics for a 2 GHz input match those of a 1-GHz input
fairly well, however, as the frequency is increased, the ADC-DAC response shifts. The shift can
be explained as follows: As the input frequency is increased and the input power is kept constant,
the outermost levels of the ADC fail to trigger because (1) the gain of the input driver decreases
and (2) the peak of the ADC input is beyond the outermost threshold levels by too small of a value
and for too short a duration. A loss of two levels out of seven corresponds to a 3 dB decrease in
the output power. In order to trigger the outermost levels, the input amplitude has to be increased
by 3 dB, which translates to a 3-dB rightward shift in the transfer characteristic on a decibel scale.
Measurements show that for a 0dBFS input power (full-scale ADC input at 1 GHz), all seven per-
225
Figure 6.5: Transfer characteristic of the system configured as an ADC-DAC for several input
frequencies.
level signals trigger for input frequencies of up to 2.5 GHz, as is evident from the 3-dB right shift
in the 2.5-GHz transfer characteristic relative to the transfer characteristic at 1 GHz. For 0dBFS
and frequencies above 4 GHz, only three levels out of seven activate. To trigger the inactive levels
at very high frequencies, the signal power can be increased in order to saturate the ADC input.
At frequencies above 3 GHz, the maximum output power is 3 dB lower than the maximum
power at low frequencies, as can be seen in Fig. 6.5. This attenuation is caused by high-frequency
roll-off effects of the blocks that follow the ADC, namely the gigahertz processor and the output
buffer. Horizontal shifts in the transfer characteristic on the decibel scale are attributed to the
limitations of the ADC speed and the input driver bandwidth, whereas the vertical shifts are due
to high-frequency attenuation of the processor and output buffer. The lowpass response of the
processor and buffer will be discussed later in the chapter.
226
Figure 6.6: (a) Measured feedthrough signal with the processor disabled for an input power of
−41 dBm. (b) Group delay of the feedthrough transfer function.
6.2.2 Feedthrough
When the CT processor is disabled, the link between the input and the output is broken. Theoret-
ically there should be no power at the buffer output, however a nonnegligible signal is measured.
The presence of the signal at the output is not attributed to the feedthrough from the driver input,
but rather from the outputs of the input driver, which offers over 40 dB of gain, distributed among
several single-ended gain stages. Several mechanisms for this signal feedthrough are identified.
Primarily, they are (1) the feedthrough of a large-swing signal at the output of the input driver
to the supply nodes of the ADC, which are shared by the CT processor, (2) bounce of the ADC
supply voltages caused by the ADC current which varies synchronously with the input signal, and
227
(3) feedthrough of the large output signal of the input driver directly to the output buffer through
the substrate and the shared ESD ring.
The feedthrough signal, measured with the processor disabled, is shown in Fig. 6.6a. In the
frequency span from 1 GHz to 4 GHz, the feedthrough power increases with frequency, which
suggests the feedthrough path is through a capacitance, for example the CGS and the CGD of the
devices in the ADC’s single-ended comparators, which are realized as a self-biased inverter. Such
feedthrough behavior is seen in simulation results, however the measured feedthrough power is
notably larger than expected. The group delay of the measured feedthrough gain is also shown
in Fig. 6.6b and is fairly constant around -850 ps through the measured frequency range when
the measurement noise is smoothed out. The magnitude of the group delay is very close to the
simulated propagation delay for the CT ADC and the propagation delay of the CT processor’s
charge pump. This supports the theory that the source of the feedthrough signal is at the input of
the ADC.
6.2.3 Effect of feedthrough on the measured frequency response of a one-tap
CT processor
The effect of the feedthrough on the response of a single-tap filter, which functions as a DAC, is
illustrated in Fig. 6.7a. Due to a difference in delay between the source of the feedthrough signal
(ADC input) and the 0th tap of the filter, the reconstructed output and the feedthrough signal add
in or out of phase depending on the input frequency. This causes the filter frequency response to
have an almost periodic ripple in the frequency domain, as can be observed in the filter responses
in (a). If the coefficient polarity is changed, the phase of the signal at the 0th tap changes by 180o;
228
Figure 6.7: (a) Measured frequency responses of a 1-tap processor for positive and negative co-
efficient signs, and the feedthrough signal. (b) Frequency response with the feedthrough signal
subtracted.
this causes the peaks of the measured filter response, occurring at frequencies where the signal
and feedthrough add in phase, to become valleys, and visa versa. A similar frequency ripple is
observed in simulation results; however, simulations show less high-frequency attenuation in the
system response and less feedthrough power, therefore the simulated frequency responses have a
less noticeable ripple. The difference in frequency between ripple extrema is also larger, which
indicates that the difference in delay between the feedthrough path and the 0th filter tap is actu-
ally shorter than predicted by simulation. When the measured signal power begins to roll off at
higher frequencies and the feedthrough power increases, the feedthrough signal cancels a larger
fraction of that power at ripple minima. The difference between the signal power at the filter out-
put for positive and negative coefficients, seen as the ripple, therefore, increases. To reduce the
effect of feedthrough, the bias current of the filter coefficients can be increased. If the feedthrough
229
signal, the magnitude and phase of which were also measured, is subtracted from the filter fre-
quency responses, the corrected responses for both coefficient polarities are essentially the same,
as expected. While the effect of the feedthrough cannot be canceled in the chip, removing it in
post processing helps to analyze the processor response. The following discussions will present
both the direct measurements results and results corrected for feedthrough to evaluate the perfor-
mance of the filter, isolated from other effects. The response of a single-tap processor with the
feedthrough signal removed in post-processing is shown in Fig. 6.7b. The response appears much
smoother than in (a) because the source of the large ripples is removed. The small amplitude
ripples in the measurement are due to nonideal impedance matching at the input and the output
of the chip and the PCB transmission lines, which can be seen in every measured response. The
feedthrough-corrected single-tap filter response follows a smooth low-pass response (impedance
mismatch effects neglected), that has an upper−3 dB bandwidth of 3.2 GHz (best upper band edge
of 3.4 GHz from some chips). The lower −3 dB frequency occurs at 0.8 GHz and is due to the AC
coupling of the input driver and ADC. The maximum achievable bandwidth of the system, there-
fore is 0.8 GHz to 3.2 GHz. This bandwidth is not a true representation of the processor because it
is affected by the speed of the ADC and the finite bandwidth of the input driver and output buffer.
Therefore, it is difficult to ascertain the processor’s behavior for high input frequencies. Increasing
the input power to overdrive the ADC and force the per-level signals is not effective because the
feedthrough power increases and corrupts the response.
The frequency locations of the filter response ripples change when the signal is reconstructed
at different taps of the filter because the delay between the feedthrough and reconstruction paths
changes, as is illustrated in Fig. 6.8. To obtain a response at the kth tap, coefficients at the other
taps are set to zero. The delay taps were tuned to have a delay of 125 ps. As the delay between
230
Figure 6.8: Normalized frequency response of a single-tap filter configured at different taps for
positive (red) and negative (blue) coefficient signs.
the feedthrough and reconstruction paths increases by pushing signal reconstruction to higher taps,
the spacing between ripples becomes smaller, as can be seen in the frequency response at the
5th filter tap relative to the minimum-delay response at the 0th tap because the delay between the
feedthrough and the reconstruction path is longer.. When the measured feedthrough signals are
subtracted, as shown in Fig. 6.9, the corrected responses at every tap are essentially the same, as
would be expected.
The response of a single-tap filter can be approximated as the sum of a feedthrough path and a
reconstruction path, the model for which is illustrated in Fig. 6.10. The feedthrough path is realized
as a 15 dB loss, which is obtained by comparing the measured filter and feedthrough responses.
The reconstruction path is passed through a lowpass filter that mimics the system’s high frequency
231
Figure 6.9: Normalized frequency response of a single-tap filter configured at different taps with
feedthrough signals subtracted.
Figure 6.10: Model of the feedthrough path and the single-tap processor path for signal recon-
struction at the kth tap.
roll-off, and is delayed by 410ps+(125psk), where k is the tap number at which the signal is
reconstructed. The reconstruction path is inverting for a negative coefficient sign. The frequency
responses obtained using this model are presented in Fig. 6.11 for example taps and are compared
to the measured responses at those taps. The model’s responses match the measured responses in
terms of the general shape and the extrema locations. It can, thus, be concluded that the primary
feedthrough source is at the input of the ADC. The feedthrough signal is 410 ps ahead of the signal
at the 0th tap according to the model fitted to measurement results, and ahead by 350 ps according
to circuit simulation results.
232
Figure 6.11: Frequency responses for a single-tap processor with the signal reconstructed at the
0th, 3rd and 5th taps, positive (red) and negative (blue) coefficients from (a) measurement results
(b) simulation results based on the feedthrough model.
6.2.4 High-frequency roll-off of the CT processor and the output buffer
The high-frequency roll-off in the measured response is attributed to the bandwidth of the input
driver, the speed of the ADC, the response of the processor and the bandwidth of the output driver.
The high-frequency attenuation of the processor is caused by (1) the anticipated high-frequency
attenuation due to the charge pumps’ finite current pulse duration, which was discussed in Section
5.1, and (2) the parasitic high-frequency attenuation of the filter grid structure due to distributed
RC effects. While the processor’s and buffer’s responses cannot be directly measured, their joint
response can be deduced from the gain of the system in an ADC-DAC configuration. The input
power to the system can be increased beyond the intended power range in order to saturate the input
of the ADC beyond its full-scale range. When the ADC input is fully saturated, all the per-level
signals trigger for frequencies even beyond 6 GHz. An ideally reconstructed output of the ADC is
then approximately a square wave, with a fixed output power at the fundamental frequency. Any
high-frequency attenuation in the measured frequency response is, therefore, attributed only to the
233
blocks that follow the ADC since the effects of the input driver bandwidth and the ADC speed are
essentially removed. The effects of the ADC speed are not entirely removed, however, because
as the input frequency increases, the transition times of the comparators’ per-level output become
a larger fraction of the input period. As a result, the output of the saturated ADC is not a square
wave because of the longer transition times, relative to the period.
The measured output power versus frequency for a single-tap processor configuration is shown
in Fig. 6.12 for an increasing input power, with the feedthrough signal subtracted. The input powers
are given relative to the input power corresponding to a full-scale ADC input in the frequency range
of 1 GHz to 2.5 GHz, and have units of dBFS (dB relative to the full-scale power). The frequency
response of the system for a full-scale input power is shown in blue. As the input power is increased
beyond 0dBFS, the output power at lower frequencies (1 GHz–2.5 GHz) begins to saturate. For
an increase in input power of over a few decibels beyond the full-scale level, the measured output
power at higher frequencies continues to increase because the increase in the input power makes
up for the attenuation of the input driver and the effective attenuation of the ADC due to not all
levels triggering. As the input power is increased, the frequency limit to which the output power
saturates is pushed higher. The saturation of power at lower frequencies and an increase in power
at higher frequencies causes the bandwidth of the measured response to increase with frequency,
as can be noted by comparing the high-frequency roll-off of the red response to the sharper roll-off
of the blue response.
The reconstructed output of an ADC with a saturated input is a square wave, with a maximum
output power that is 20log10( 4pi) = 2.1dB above the power corresponding to a full-scale output,
where the 4pi factor is obtain from the fundamental component of a Fourier series expansion of a
square wave. The measured response at the maximum input power shown represents the response
234
Figure 6.12: Measured frequency response, with feedthrough subtracted, for a single-tap filter for
increasing input powers.
of the processor and output buffer, and has an upper −3-dB-frequency at 3.6 GHz and an atten-
uation of 5 dB at 4.5 GHz. The measured high-frequency roll-off is sharper than predicted by
the simulation results; this is attributed to the distributed RC attenuation of the metal grid formed
by the filter’s output connections. Due to the magnitude of the structures and the complexity of
numerous other interconnections, the filter simulation model derived with layout extraction tools
has limited accuracy.
6.2.5 Deducing the gain of the input driver from feedthrough measurements
The source of the dominant feedthrough signal has been identified as the input driver output, which
drives the CT ADC. The feedthrough signal can be used to indirectly determine the input driver’s
frequency response, which cannot be measured. This information is used to help deduce the per-
formance of the processor from the measured response of the entire system. In this feedthrough
study, the CT processor is disabled and therefore does not contributed to the output power. For
a low-power input signal, which does not saturate the output of the input driver, the measured
235
Figure 6.13: Measured feedthrough power for increasing input frequency. The CT processor is
disabled.
frequency response is the combination of the frequency response of the input driver and the fre-
quency response of the feedthrough path from the ADC input to the buffer output. For a sufficiently
high-power input signal, which saturates the input driver for all input frequencies of interest, the
response of the input driver is merely that of a flat gain. The nonlinearity of this gain can be
neglected because only the measured signal at the input frequency is of interest. The measured
system response is then that of the driver gain, which is approximately flat in the frequency range
of interest, and the response of the feedthrough path (ADC input to buffer output). The shape of
the input-driver gain can be deduced from of the measured system response for a low-power input
and the measured response for a high-power input. Only the shape is recovered with this method,
because only the shape of the feedthrough transfer function is known.
The effect of increasing power on signal feedthrough is illustrated in Fig. 6.13. When the input
power is increased, the feedthrough signal also increases until the point where the input driver
begins to saturate. It takes more input power to saturate the driver at higher frequencies, as can
be seen in the measured feedthrough signals. The upper curve, shown in black, corresponds to the
236
highest input power for which the measured response is nearly saturated. This response, then is the
approximate feedthrough path response. The frequency response of the driver can be deduced by
comparing the feedthrough signals measured for a fully saturated and a non-saturated driver at all
frequencies. The deduced driver gain is shown in Fig. 6.14. The gain has a slight peak at 1.5 GHz,
a modest gain roll-off from 1 GHz to 3 GHz, and then a sharper roll-off starting approximately
at 3.2 GHz. The high-frequency roll-off behavior agrees with the roll-off observed in the mea-
sured response of the overall system for a single-tap processor configuration. The high-frequency
roll-off of the processor and output driver, which was deduced in the previous section, account
for some of the remaining attenuation. These two deduced gains can be combined to estimate the
expected response of the overall system; this calculated response is compared to the actual mea-
sured response of the system in Fig. 6.15. While the two responses for a positive coefficient have
noticeable discrepancies, the responses for a negative coefficient and for the average response with
feedthrough subtracted have better matching. The response derived from the deduced input driver
and processor/buffer gains is not accurate because it does not take into account the response of
the ADC, however the two deduced responses are helpful in understanding the system’s overall
response.
6.3 Parasitic-coupling-induced increase in half-harmonics at the
0th filter tap
The per-edge encoding of signals causes the generation of half-harmonics of the input frequency,
as was discussed in Sec. 4.2.1. These harmonics are designed to be at least 20 dB below the
237
Figure 6.14: Deduced input driver gain.
Figure 6.15: Frequency response of the system for a single-tap processor configuration for (a) a
positive coefficient, (b) a negative coefficient, and (c) with feedthrough subtracted. Results shown
for measured responses (solid) and responses derived from the deduced input driver response and
the deduced processor and output buffer response (dashed).
238
level of a full-scale in-band signal. Measurement results show that the system satisfies this design
goal for processor configurations that make use of the 0th through the 5th tap. When the 6th tap
is enabled, however, the half-harmonics distortion is significantly increased. Simulation results
using extracted models of the filter, and with additional modeling of the filter’s grid structure do
not show this increased distortion. Measurement results, presented in Fig. 6.16, show that the
signal at the output of the 6th tap causes strong half-harmonic distortion. The results reported in
the figure were measured for a configuration where the coefficient of only the 0th tap was enabled
and all other coefficients were set to zero, but the delay cells at higher taps were enabled. When
tap delays were enabled for the 1st filter tap through to the 5th tap, the response of the filter did
not change and the half-harmonic power, shown in black, was at least 28dB below the level of the
fundamental component. When the 6th-tap delay cells were enabled, the half-harmonic distortion,
shown in red, noticeably increased and was as much as 13 dB below the level of the fundamental
component. The increase in distortion due to the 6th tap is more prominent when more ADC levels
are activated, at higher input power levels. The difference between the distortion power for the two
cases of the 6th tap enabled and disabled increases when the outer-most level of the ADC, (0 and
6) are activated.
The reason for this increased distortion is the parasitic coupling to the filter’s output node from
the long wires at the output of the 6th-tap delay cells, which connect the last delay cell in each
row to the first delay cell in that row through a switch to enable a ring oscillator configuration for
delay tuning (refer to Sec. 5.7.1). A simplified diagram of the processor layout diagram is shown
in Fig. 6.17.
Higher-level metal vertical wires at each tap, shown in blue, which connect to the coefficient
outputs at all levels, are joined by horizontal wires at the rising-edge filter at level 7 and the falling-
239
Figure 6.16: Measured output power of components at fIN and 12 fIN for signal reconstruction at
the 0th tap, with coefficients at all other taps disabled but the delay cells at other taps enable. The
half-harmonic power is shown for the cases of the 6th-tap delay cells enabled and disabled.
edge filter at level 1. These output wires run along the ring oscillator feedback wires, shown
in burgundy, for the entire width of the filter and, therefore, have some mutual coupling. At
other levels, where there is no horizontal output wire, there are parasitic coupling capacitances
where the ring oscillator feedback wires cross above the vertical output wires. The capacitances
between all these lines, obtained through layout extraction, were not significant enough to be the
cause of distortion in simulations. The coupling between these lines, underestimated by extraction,
is considered to be the cause of the increased distortion. The reason this distortion occurs at a
frequency corresponding to the half-harmonic of the input is because at each filter tap k, all of the
per-edge signals at all levels toggle in the same direction at every input cycle as is illustrated at
the bottom of Fig. 6.17. With increasing input power, as more ADC levels are activated, more
per-edge signals couple to the filter output, causing the distortion to increase. A large increase in
distortion occurs when the 0th and 6th levels are enabled, because the coupling capacitance from
the 6th-tap output to the filter’s output nodes significantly increases. The increase in distortion can
be prevented by avoiding using the 6th tap, which, unfortunately, limits the programmability of the
processor from a seven-tap to six-tap configurations.
240
Figure 6.17: A simplified diagram of the processor layout and example signals to illustrate the
generation of the strong half-harmonic distortion.
241
Figure 6.18: Tap-delay values obtained through simulation (dashed) and estimated from measure-
ments using a ring-oscillator-based tuning scheme (solid) for (a) the full tuning range and (b) the
intended delay range.
6.4 Delay cell
The delay of all the delay cells can be changed globally by varying a reference bias current. The
delay cells in each per-edge filter are controlled together with 6-bit control words. The delays are
tuned at each level by configuring all 6 delay cells in a ring oscillator configuration, the output
of which drives a binary counter. The oscillation frequency is determined from the output of the
counter, which is triggered and stopped by a reference low-frequency clock (10 MHz) from an
FPGA. The delay cells tested at several levels have a delay mismatch of less than 2% for the same
control word. The average estimated tap delay for each digital control word, measured using the
ring oscillator tuning scheme, is illustrated in Fig. 6.18, and compared to the delay obtained from
simulations of the delay cell using a layout-extracted model. The measured result varies from the
simulation results by less than 20% in the intended delay range, shown in the figure on the right.
The ring-oscillator-based tuning scheme allows the delay cell in each per-level filter to be
tested. An alternative test scheme can be used to estimate the tap delay based on the notch fre-
quency of a notch-configured filter. With this tuning scheme, the delay cells at each tap are tested
242
Figure 6.19: Simulated tap delays (dashed) and calculated average tap delays based on a measured
notch frequency (solid) versus reference bias current for a control-word value of 32.
instead. For a notch filter with non-zero coefficients at tap k− 1 and tap k, the notch occurs at
a frequency of 12TD,k , where TD,k is the average delay of the k
th tap. For delays values below 140
ps, the notch will occur above 3.5 GHz , which makes it difficult to measure accurately because
of the fact that the level of the feedthrough signal increases with frequency. A three-tap notch
configuration is used instead, where the middle coefficient is set to equal zero; the notch frequency
occurs at a frequency of 14TD where TD is the average tap delay. The estimated tap delays based on
the measured notch frequency are shown in Fig. 6.19 and are compared to the tap delays obtained
with simulation, which are overestimated but are within 10% of the measured values. For this plot,
the delay was varied by changing the reference current rather than the digital control word.
6.5 Frequency responses of the gigahertz digital FIR filter
The CT digital processor can be configured as a one-tap to a six-tap FIR filter. The last filter tap
is not used to avoid an increase in the half-harmonic distortion, as was discussed in Sec. 6.3. The
measured frequency responses are given as measured output power for a full-scale input power.
243
The measured frequency response of the processor configured as a notch filter is shown in
Fig. 6.20; an ideal notch filter frequency response is also shown, with gain adjusted to match the
peak measured output power. The notch filter was realized by enabling the 0th-tap and 3rd-tap
coefficients, while turning the other coefficients off in order to a produce a low-frequency notch
without requiring a single long tap delay. The primary notch at 0.95 GHz is caused by a 525 ps
delay between the enable-coefficient taps. The total delay is realized by cascading three tap de-
lay cells, which sets the maximum input frequency limit to 3525ps = 5.7GHz, which is extended
by a factor of three relative to a configuration where a single granular 500-ps delay is used. At
a frequency corresponding to twice the first notch frequency, signals from the two taps add con-
structively, thus causing a maximum gain in the frequency response. The second notch at 2.85
GHz corresponds to three times the first notch frequency. The notch frequencies correspond to
odd multiples of the first notch frequency and response maxima correspond to even multiples. The
measured frequency response for frequencies below 0.8 GHz differs from the ideal response due
to the input AC coupling between system blocks, which limits the lower system bandwidth. In the
frequency range of 0.8 GHz to 2 GHz, the matching between the measured and ideal is near per-
fect. As the input frequency is increase above 2 GHz, the high-frequency attenuation of the system
becomes noticeable. At 3.2 GHz, the measured response deviates from the ideal response by over
3 dB, indicating that the -3 dB bandwidth of the system is therefore 0.8 GHz to 3.35 GHz. The
frequency responses of three-, four-, and five-tap notch filters is shown in Fig. 6.21 for different
values of the tap delay. The primary notch frequency can be programmed from 0.75 GHz to 2.4
GHz. The lowest primary notch is realized with a five-tap notch filter and the highest primary notch
at 2.4 GHz is realized with a three-tap notch filter. Higher notch frequencies can be realized with a
two-tap filter configuration or as secondary notches of the higher-order notch filters. The depth of
244
Figure 6.20: Measured frequency response of a notch filter and a gain-adjusted response of an
ideal notch filter.
Figure 6.21: Measured frequency response of several notch filters (see text).
the notches at higher frequencies is limited by the feedthrough signal. Extremely sharp notches of
over 50 dB attenuation are possible when the filter output signal and the feedthrough signal cancel
each other (an unreliable effect). In the case of constructive addition of the two signals, the worst
case notch depth of −30 dB is achieved within the bandwidth range. For frequencies above 3.3
GHz, the feedthrough signal power increases and limits the notch depth to −20 dB.
All of the following frequency responses were realized using coefficients determined with con-
ventional filter design tools. The response of a five-tap bandpass filter is shown in Fig. 6.22. As
245
in the case of a notch filter, the measured response matches the ideal gain-adjusted response in
the frequency band of interest. This bandpass response is realized using the coefficients of a high-
pass discrete-time filter, whose Nyquist frequency, fS/2, corresponds to the center frequency of
the measured response. The measured upper passband is realized at the replica highpass band of
the DT response, which is possible due to the lack of aliasing thanks to clockless operation. The
measured frequency response of a six-tap low-pass filter is shown in Fig. 6.23; it matches the ideal
response fairly well in the frequency band of interest. Fig. 6.25 and Fig. 6.26 show the responses of
two different five-tap bandstop FIR filters. Fig. 6.24 shows the response of the CT digital processor
configured as three-tap amplitude equalizer in the 1-GHz to 3-GHz frequency band.
Frequency responses designed to have a stopband attenuation in the range of 15 dB to 25
dB match the ideal responses well. As the desired stopband attenuation is increased, however,
the measured response can diverge from the ideal because it is limited by the three-bit resolution
and also the feedthrough signal. For higher input frequencies beyond the band of interest, the
divergence of the measured response from the ideal is attributed to the increasing feedthrough
power and the inability of the ADC to trigger the outer per-level signals. Note that the maximum
output power is higher for higher-order filters because more taps contribute to the signal power.
Conventional FIR filter transfer functions that are less sensitive to coefficient variation lead to
well-matched measured responses of the CT processor. When considered in the Z-domain, the
transfer functions with zeros on the unit circuit are less sensitive if the variation in the coefficients
causes the zeros to move on the unit circuit. For other transfer functions, variation in coefficients
can cause two zeros to leap off of the unit circle and move in opposite directions relative to the
center of the unit circle, which leads to more prominent variations in the stopband. An example
of a coefficient-sensitive stopband is shown in the frequency response of a six-tap lowpass filter
246
Figure 6.22: Measured frequency response of a bandpass filter, and ideal gain-adjusted response.
Figure 6.23: Measured frequency response of a lowpass filter, and ideal gain-adjusted response.
in Fig. 6.27. The ideal response has a center notch corresponding to fS/2 with two other notches
symmetrically offset from this frequency, and an ideal stopband attenuation of 40 dB. If one of
the coefficients in the ideal frequency response varies by 10%, the stopband attenuation drastically
changes and leads to an attenuation of only 25 dB, as is seen in the red waveform in the figure. The
measured response, shown in blue, has a similar elevation of the stop-band.
247
Figure 6.24: Measured frequency response of an amplitude equalizer, and ideal gain-adjusted
response.
Figure 6.25: Measured frequency response of a bandstop filter, and ideal gain-adjusted response.
Figure 6.26: Measured frequency response of a bandstop filter, and ideal gain-adjusted response.
248
Figure 6.27: Measured frequency response of a lowpass filter, and ideal gain-adjusted response
(see text).
6.6 Spectra of processed signals
All spectrum measurements are taken with a 1-MHz bandwidth and 1-MHz spacing between fre-
quency bins in order to accurately capture the error power. All in-band SNDR and SFDR measure-
ments are calculated for error in the 0.8-GHz to 3.2-GHz band of interest.
An example spectrum of a 1-GHz full-scale input signal processed through a single-tap CT
per-edge-encoded digital processor is shown in Fig. 6.28. A skirt of tones around the fundamen-
tal component is due to the windowing effect of the ADC reset time, which was described in the
beginning of this chapter. Since these tones are an artifact of the measurement set up, the power
of a few tones closely bounded with the fundamental tone will not be added to the in-band error
power. The output spectrum of a CT digital processor differs from the spectra of an analog proces-
sor and a conventional discrete-time digital processor in that it contains harmonic distortion due
to quantization, unlike the analog processor, but has no aliased quantization error floor, in contrast
to the DT processor. The measured noise floor, which is at approximately −83 dBm for a 1 MHz
measurement bandwidth, is mostly due to the thermal noise of the input driver, but the processor,
ADC and output driver also contribute to the output random noise. The output spectrum contains
249
Figure 6.28: Measured output spectrum for a 1 GHz full-scale input processed through a single-tap
filter. The frequency response of the filter is shown as a dashed line.
tones at the input frequency and harmonics. Tones at half the input frequency and its harmonics
that are caused by delay discrepancies in the per-edge encoder and processor are present in the
spectrum, but are below the level of the harmonics caused by three-bit quantization. The second
harmonic, which is the dominant distortion spur, is 24.5 dB below the level of the fundamental
component. The worst-case performance of the processor for a full-scale single-tone in-band sig-
nal occurs for a single-tap filter configuration because only one tap contributes to the signal, which
leads to the lowest output power, and also because there is no filtering of noise and harmonics. The
worst case in-band SNDR and SFDR are then 20.3 dB and 24.5 dB, respectively. Measurements
indicate that the system is noise-limited due to the wide bandwidth. The output spectrum of the
same filter configuration is shown in Fig. 6.29 for a two-tone input consisting at equal-amplitude
tones at 2 GHz and 2.1 GHz, which are surrounded by intermodulation products caused by the
nonlinear transfer characteristic of a three-bit quantizer. The signal-to-error ratio is degraded com-
pared to a single-tone input because the input power has decreased by 3 dB while the noise floor is
unchanged. In-band SNDR and SFDR for this case are 18.1 dB and 22 dB, respectively.
Better performance can be achieved using a higher-order processor configuration. While using
250
Figure 6.29: Measured output spectrum for a two-tone input processed through a single-tap filter
with each tone -6 dB relative to full-scale power. The frequency response of the filter is shown as
a dashed lines.
more taps, which generate some noise, causes an increase in the noise floor, the increase in the
output power and the filtering of noise and harmonics lead to a better signal-to-error ratio. The
output spectra of a three-tap notch filter are shown in Fig. 6.30 for single-tone and two-tone inputs.
The shape of the noise floor mimics the shape of the frequency response of the filter. Due to the
filtering of harmonics, the in-band SNDR and SFDR for a single-tone improve by 6 dB and 8 dB,
respectively. For a two-tone input in (b), the 3.2-GHz tone falls into the passband, whereas the
1.9-GHz tone is close to the frequency notch and is, therefore, attenuated by 24 dB. If the 1.9-GHz
tone corresponds to some undesirable signal, the notch filter successfully attenuates it below the
level of quantization harmonics, and an SNDR of 15.5 dB is reached. Note that the degradation
in the SNDR compared to the case of a single-tone input is because the signal of interest is only
one of two tones and its input power is halved. For the two-tone case in (c), the tone at 2.8 GHz
coincides with the frequency notch and is therefore attenuated by 40 dB.
The frequency spectra of several single-tone inputs processed through an amplitude equalizer
filter are shown in Fig. 6.31, with the spectra overlaid. The output power of the tones closely
251
Figure 6.30: Measured output spectra of a notch filter for (a) a single-tone full-scale input at 0.8
GHz , (b) a two-tone equal-amplitude input of 1.9 GHz and 3.2 GHz with each tone -6 dB relative
to full-scale power, and (c) a two-tone equal-amplitude input of 1 GHz and 2.8 GHz with each tone
-6 dB relative to full-scale power. The frequency response of the filter is shown as a dashed line.
252
Figure 6.31: Measured output spectra for full-scale single-tone inputs (overlaid) for a processor
configured as an amplitude equalizer. The frequency response of the filter is shown as a dashed
line.
follows the frequency responses.
The measured spectra for several filter configurations are shown in the following figures for
two-tone inputs. The more conservative case of a two-tone input is preferred to a single-tone input
because few harmonics of the input fall in-band in the latter case. Further, the in-band SNDR for
a full-scale two-tone input can be 3 dB worse, in the case of two in-band tones, and 6 dB worst,
in the case of one in-band tone and one out-of-band tone, than for a full-scale single-tone input.
Several such input scenarios are considered.
Fig. 6.32 shows the output spectra of a bandstop filter for two-tone inputs for the case that
one tone is in each passband of the filter and the case where one tone is in-band and another is
in the stopband. The spectrum in the former case contains the two fundamental components plus
an intermodulation product, which happens to occur in the stopband of the filter. The powers of
the two output tones are slightly different because of the system’s high-frequency roll-off. The in-
band SNDR and SFDR are 19.15 dB and 28 dB, respectively. For the other case, the undesirable
component at 2 GHz is attenuated by 17 dB, resulting in an in-band SNDR and SFDR of 13.7 dB
253
Figure 6.32: Measured output spectra for two-tone equal-amplitude inputs (with each tone -6 dB
relative to full-scale power) at (a) 1 GHz and 3 GHz and (b) 0.8 GHz and 2 GHz, processed through
a bandstop filter. The frequency response of the filter is shown as a dashed line.
254
Figure 6.33: Measured output spectrum for a two-tone equal-amplitude inputs (with each tone -6
dB relative to full-scale power) with in-band tones at 1.2 GHz and 1.3 GHz, processed through a
lowpass filter. The frequency response of the filter is shown as a dashed line.
and 17 dB, respectively. The output power of the single tone in the band of interest is attenuated
by 3 dB because it is at the band edge.
The measured output spectrum for a lowpass filter is shown in Fig. 6.33 for a two-tone in-band
input. Due to the wide bandwidth of the filter, many intermodulation products and harmonics
caused by quantization are in-band. The dominant in-band spur is 28 dB below the level of the
signal. The intermodulation product at the difference frequency of the two input tones is outside
the band of interest but is not attenuated by the frequency response that is shown. The lower−3 dB
frequency of the system, with the processor configured as an all-pass single-tap filter, is limited by
the input AC coupling of input driver and the ADC. The lower cutoff frequency of the processor,
which is caused by the DC control block, and the output buffer, the which is also AC coupled, are
non-dominant. As a result, the low-frequency distortion introduced by the nonlinearity of three-bit
quantization appears unattenuated. The measured in-band SNDR and SFDR for this configuration
and input scenario are 22.9 dB and 28 dB, respectively.
255
Fig. 6.34 presents the output spectra of a bandpass filter for several two-tone input scenarios:
(a) two in-band tones, (b) one in-band tone and one higher-frequency out-of-band tone, and (c) two
out-of-band tones. For the scenario in (a), the in-band SNDR and SFDR are 22.1 dB and 28 dB,
respectively. The narrowband rise in the noise floor around 1.5 GHz is due to the half-harmonics
of the two input tones. For the scenario in (b), the out-of-band component is attenuated by 15
dB relative to the in-band tone. In the last case (c), the two out-of-band components create inter-
modulation products that fall in-band. The dominant error spur is the third-order intermodulation
product at 3 GHz, which falls in the passband of the filter but is 22 dB below the level of a full-scale
in-band output. Other intermodulation products are attenuated by the bandpass filter.
6.7 Signal-dependent power dissipation
The active power consumption of the CT digital processor consists of a static power dissipation of
270µW due to the bias currents of the DC control block and an input-activity-dependent dynamic
power dissipation. The power consumption dedicated to bias generating circuits for the delay cell
and the filter coefficients, which has an average value of 3.3 mW is excluded from the active power
consumption of the processor. The bias generation was designed to allow for better testability
and the direct control for the prototype chip rather than for power efficiency; for this reason,
the bias power is given separately. The active power dissipation of the CT processor, which is
shown in Fig. 6.35 for a six-tap filter configuration, depends on the number of tokens that it has
to process. Measurement results confirm that as the input power is increased, approaching the
full-scale input power, more quantization levels are triggered and the number of tokens generated
increases; the dynamic power dissipation likewise increases, as can be seen in the progressively
256
Figure 6.34: Measured output spectrum for a two-tone equal-amplitude inputs (with each tone -6
dB relative to full-scale power) processed through a bandpass filter for (a) two in-band tones at 2.9
GHz and 3 GHz, (b) one in-band tone at 3 GHz and one out-of-band tone at 4 GHz, and (c) two
out-of-band tones at 3.75 GHz and 4.5 GHz. The frequency response of the filter is shown as a
dashed line.
257
increasing power dissipation curves. As the frequency of the input is increased, the quantization
levels are traversed faster, causing the token rate and, consequently, the power dissipation to also
increase. The processor power consumption automatically adapts to the activity of the input and
drops to its minimum power dissipation of 0.27 mW during intervals of silence. If the DC control
block is disabled, the power dissipation of the circuit is just the one due to the leakage currents.
Beyond 3.2 GHz, some levels of the ADC that supply the digital inputs fail to trigger and
the power dissipation begins to level off. As the frequency is further increased, even fewer ADC
levels trigger and the power dissipation declines. The increase in processor power dissipation
with increasing filter complexity is shown in Fig. 6.36 for full-scale input power. The power
dissipation curve in both figures illustrate that power varies linearly with frequency in the band of
interest. Fig. 6.37 shows the energy consumption of the filter per input cycle, which is obtained by
dividing each power curves in Fig. 6.36 by the input frequency. The energy dissipation is flat in the
frequency band of interest, as is discussed in Sec. 5.1. Each time the number of taps is increased
by one, the energy dissipation increases by 14ETAP (from 2 per-edge tap block for each one of 7
levels), where ETAP is the energy dissipation of each tap per token. The energy per tap is the sum of
delay-cell energy consumption (11 fJ from simulation) and the charge-pump energy consumption
(29 fJ for the XOR-base charge pump, from simulation). The energy consumption to the processor
with the 6th tap enabled differs from the other curves due of the parasitic coupling between the
outputs of delay cells at the last tap to the output node, which also causes the generation of strong
half-harmonics. According to measurement results, the energy consumption per tap, with the
exception of the 6th tap, is 35 fJ, which is 12% less than is predicted by simulations using models
extracted from the layout.
258
Figure 6.35: Power dissipation of the CT digital processor versus input frequency for several
values of the input power, relative to the full-scale input power.
Figure 6.36: Power dissipation of the CT digital processor versus input frequency for a full-scale
input.
259
Figure 6.37: Filter’s energy consumption per input cycle.
6.8 Performance summary and a comparison to other work
The performance summary of the processor is shown in Table 6.1.
This section presents a comparison of the CT digital processor described in this work to CT
digital processors in prior art and to conventional processors. The presented CT processor extends
the operation of a CT DSP from the voiceband frequency range, achieved by [14, 15], to the gi-
gahertz frequency range. While the proposed processor is based on clockless quantization and
signal-driven processing like CT DSPs in prior art, the gigahertz realization of the processor is
very different from that of low-speed implementations. The gigahertz digital processor can also be
compared to conventional DSP implementations in the same frequency range.
The comparison is shown in Table 6.2. It should be noted that the DT DSP in [58] requires a
clock signal; the power dissipation of the clock is not included and would notably contribute to the







Process ST 65 nm CMOS V
Supply voltage 1.2 V
Resolution 3 bits
Filter type 1- to 6-tap FIR
Tap delay range 95-200 ps
Maximum system bandwidth 0.8-3.2 GHz
Maximum effective sampling rate 45 Gsample/s
(3.2 GHz full-scale input)
CT DSP static power∗ 0.3 mW
(∗excluding test and bias circuits)
CT DSP energy per sample per tap 40 fJ
Bias and test circuit power 3.3 mW
In-band SFDR for a full-scale 1 GHz
1-tap filter (ADC-DAC) 23.5 dB
6-tap filter 31.5 dB
In-band SNDR for a full-scale 1 GHz
1-tap filter (ADC-DAC) 20.3 dB
6-tap filter 25.3 dB
Table 6.1: Performance summary.
where P is the total power dissipation, fBW is the maximum bandwidth, ENOB is the effective
number of bits and K is the filter order. The average power dissipation of the CT DSP is stated for
a full-scale input at the average input frequency.
Relative to the CT DSPs in prior art, the presented work achieves a five-orders-of-magnitude
improvement in the maximum input frequency and a reduction in the Figure of Merit by a factor
of over 100. To enable gigahertz processing, however, a low resolution of three bits is used. The
gigahertz discrete-time DSP in [58] uses 8 bits of resolution but has a maximum input frequency
of 1.05 GHz, while the GHz CT DSP achieves a maximum input frequency of 3.2 GHz. For the
CT DSP in this work, signals in the range of 3.2 GHz to 6 GHz (for tap delay of under 160 ps),
which are outside the bandwidth, are not aliased by the CT processor. The in-band half-harmonics
and intermodulation products created by these out-of-band components are at least 20 dB below
261
Parameter Li [14] Schell [15] Agrawala [58] This work
Process 0.25µm CMOS 90 nm CMOS 32 nm CMOS 65 nm CMOS
Supply voltage 2.5 V 1 V 1 V 1.2 V
Clock no no yes no
Resolution 6 bits 8 bits 8 bits 3 bits
Max. frequency 20 kHz 20 kHz 1.05 GHz 3.2 GHz
Bandwidth 20Hz-20kHz 20Hz-20kHz 0 - 1.05 GHz 0.8GHz-3.2GHz
Area (DSP core) 11.6 mm2 0.55 mm2 0.004 mm2 0.073 mm2
(estimated)
Max. power 196 mW 3.0 mW 24 mW∗ 9.5 mW
Average power 100 mW 1.6 mW 24 mW∗ 6.3 mW
Number of taps 16 16 4 7
FOM 2.6 nJ/sample 3.3 pJ/sample 15 fJ/sample 30 fJ/sample
∗ Not including clock generation.
Table 6.2: Comparison of the presented work to CT DSPs in prior art and a state-of-the-art DT
DSP.
the full-scale output input in the band of interest. As a result, the CT processor allows input
frequencies beyond its bandwidth, whereas in the DT DSP, components at frequencies above 1.05
GHz will alias in-band.
262
Chapter 7
Conclusions and Suggestions for Future
Work
7.1 Conclusions
Signal-encoding techniques and the resulting architectures for continuous-time digital signal pro-
cessors were investigated in this work. A variable-resolution encoding was developed for low- to
moderate-frequency signals, which eases hardware speed requirements by increasing the quanti-
zation step size according to the magnitude of the input slope. It was shown that an adjustable
resolution causes a reduction in the effective sampling rate and leads to a decrease in the static
and dynamic power dissipation of a CT digital system. A per-edge digital signal encoding and
the corresponding processor architecture were developed for signals in the gigahertz frequency
range; the presented technique bypasses the timing limitation of processing gigahertz digital sig-
nals with conventional encodings. Technology-imposed limitations of speed and resolution in giga-
hertz continuous-time digital processors were discussed. The implementation of a low-resolution
263
CT DSP for signals below 6 GHz was detailed and measurement results of a prototype chip were
presented.
7.2 Suggestions for future work
7.2.1 First-order DAC reconstruction for low- to moderate-speed CT appli-
cations
Reconstruction methods for CT-sampled signals for real-time applications have so far been primar-
ily limited to zero-order hold because higher-order reconstruction requires complex calculations,
which are prohibitive. First-order real-time reconstruction techniques, one of which connects con-
secutive samples with straight lines, require the sampling times to be finely quantized with a high-
speed clock. Another technique, which also requires a high-speed clock, transmits an estimate
of the slope of the signal along with the sample [34]. The output is reconstructed by extrapolat-
ing the signal between samples according to the estimated slope. The latter technique is valid for
reconstructing an unprocessed signal because the slope information is not valid after digital filter-
ing. Both techniques rely on a high-speed clock to realize higher-order reconstruction and require
power dissipation that might be too high for some real-time applications.
A first-order-hold (FOH) CT DAC can be realized alternatively without using a high-speed
clock and without adding excessive hardware and power overhead. Instead of connecting samples
with straight lines, the realization of which is quite involved, the reconstructed signals between
tokens can be linearly extrapolated based on the slope of the previous output token, without requir-
ing an estimate of the slope to be transmitted from the CT ADC. The slope can be estimated from
264
a few or just two previous tokens at the output of the processor. For example, since consecutive
samples can change by a single LSB (or multiples of an LSB, depending on the quantization step in
a variable-resolution system), the slope can be determined by measuring the time between consec-
utive samples. A current source, whose current is sized inversely proportional to the quantization
step, can be used to charge a capacitor, as shown in Fig. 7.1. The current source is enabled when
one output token arrives and disabled when the next token arrives. Two such slope estimators can
be used such that one capacitor is reset when the other is being charged. The charge accumulated
on the capacitor is proportional to the time between tokens and inversely proportional to the quanti-
zation step, therefor it is an indication for the inverse of the slope. The slope can also be estimated
from several past tokens rather just two tokens. The estimated value is used to vary the gain of
the integrator; the integrator output is reset on every token arrival. The output of the estimator is
then added to or subtracted from the output of a ZOH CT DAC, depending on the direction of the
signal, which is known; the sum is a first-order reconstructed signal.
First-order interpolation of CT-sampled signals, however, results in an increase in low-frequency
distortion, as discussed in Sec. 3.2, because for signal segments with negative curvature, the ex-
trapolated output is above the ideally-reconstructed signal and for positive curvature it is below the
signal. The resulting error between the reconstructed signal and the ideally-reconstructed signal
has a positive and negative local mean for negative and positive curvature segments, respectively.
This error is periodic with the input, and therefore contains power in the lower-frequency har-
monics, which have a higher power than in the case of zero-order reconstructed signals. The
higher-frequency harmonics, on the other hand, are significantly reduced compared to those of
ZOH-reconstructed signals. One way to reduce the low-frequency distortion is to add a small off-
set to the estimated slope every other token, such that the quantization error has a mean that is
265
Figure 7.1: Example implementation of a first-order reconstructing CT DAC comprised of a ZOH
DAC and a linear extrapolator.
closer to zero. The offset should be proportional to the slope but can be static. The alternating
offset reduces the low-frequency distortion which most often falls in-band at the cost of a slight
increase in high-frequency distortion. Instead of adding the offset alternating at every other token,
the tokens at which the offset is added can be set according to an optimized pattern or a pseudo-
random pattern. The alternating offset scheme can be designed to spread the increased error power
over a wider band of frequencies to reduce the peak spur power. Alternatively, the curvature of
the signal can be measured to correct the slope estimate. An easy implementation of this involves
several slope estimators described in the previous paragraph, with each estimator enabled at each
consecutive token in a group of tokens. By comparing the output values of multiple estimators the
curvature of the signal can also be determined. The curvature estimate can be used to control the
size of the slope offset or to implement a second-order interpolating DAC.
266
7.2.2 Improvement of gigahertz-range CT DSP
Several improvements can be made to the CT GHz DSP. These improvements will, most likely,
come at the cost of increased power dissipation.
To increase the maximum input frequency limitation, the tap delays can be realized as a cascade
of two delay cells. The granularity improvement of the two-cell tap delay allows the maximum
input frequency to be doubled, and helps to ensure that each cell is well settled after each token.
This implementation mitigates the problem of systematic input-polarity-depended delay mismatch
because the delay discrepancy of one inverting cell is corrected by the following delay cell. The
matching of a delay cell’s delay for rising and falling input is then allowed to be significantly
worse. For tap delays below about 180 ps, however, the delay cell designs would have to be
revised to reduce the delays of each delay cell. To further reduce half-harmonics which can become
problematic for frequencies above 5 GHz, the outputs of the ADC would have to be delay-equalized
because the per-edge encoded ADC already generates delay discrepancies.
The resolution of a CT GHz system, with any digital encoding, not just per-edge encoding, is
primarily limited by device mismatch. Small transistors with large variation are used in order to
avoid very large power consumption of digital gates switching at gigahertz frequencies. The device
sizes of critical transistors, for example the current-starved transistors in the delay cell, the delaying
path transistors in the pulse generator and the current sources in the charge pump can be increased
to optimize the mismatch-power trade-off. Increasing current source transistor sizes, however,
comes at the additional cost of an increase in the non-linear drain capacitance, which contributes
to the filter’s load capacitance. The linear coefficient range can also be improved by adding extra
current source devices in parallel, which can be switched according to the coefficient size; however,
267
this also comes at the cost of additional drain capacitance, and requires larger switching devices
and larger buffers to drive the switches. As expected, better performance comes with a higher
power bill.
The per-edge-encoded CT DSP generates half-harmonics of the input due to input-polarity-
dependent delay discrepancies, as explained in Sec. 4.2.1. The half-harmonics are filtered by a
frequency response which differs from the frequency response of the filter because not all coeffi-
cients of the filter contribute power to the half-harmonic. Further investigation is needed to develop
a filter design methodology which achieves the desired frequency response for the input signal but
also achieves a beneficial frequency response for the half-harmonics.
268
Bibliography
[1] Y. Tsividis, “Digital signal processing in continuous time,” Electron. Lett., vol. 39, no. 21,
pp. 1551-1552, Oct. 2003.
[2] F. Maravasti, Nonuniform Sampling Theory and Practice. New York: Kluwer, 2001.
[3] W. R. Dieter, S. Datta, and W. K. Kai, “Power reduction by varying sampling rate,” in Proc.
Int. Symp. on Low Power Elect. and Design, pp. 227–232, 2005.
[4] R. C. Dorf, M. C. Farren, and C. A. Phillips, “Adaptive sampling frequency for sampled-data
control systems,” IRE Trans. on Automatic Control, vol. 7, no. 1, pp 38-47, 1962.
[5] J. W. Mark and T. D. Todd, “A nonuniform sampling approach to data compression,” IEEE
Trans. on Commun., vol. COM-29, no. 1, pp. 24-32, 1981.
[6] J. Foster and T.-K. Wan, “Speech coding using time code modulation,” in Proc. IEEE South-
east Conf., pp. 861–863, Apr. 1991.
[7] N. Sayiner, H. N. Sorensen, and T. R. Viswanathan, “A level-crossing sampling scheme for
A/D conversion,” IEEE Trans. on Circuits and Systems II, Analog Digit. Signal Process.,
vol. 43, no. 4, pp. 335-339, Apr. 1996.
[8] H. Inose, T. Aoki, and K. Watanabe, “Asynchronous delta-modulation system,” Electron.
Lett., vol. 2, no. 3, pp. 95-96, Mar. 1966.
[9] P. A. Sharma, “Characteristics of asynchronous delta-modulation and binary-slope-
quantized-P.C.M. systems,” Electron. Eng., vol. 40,pp. 32-37, Jan. 1968.
[10] R. Steele, Delta Modulation Systems. New York: Wiley, 1975.
[11] E. Allier, G. Sicard, L. Fesquet, and M. Renaudin, “A new class of asynchronous A/D con-
verters based on time quantization,” in Proc. IEEE Int. Symp. Asynchronous Circuits Sys.,
pp. 196–205, May 2003.
[12] M. Trakimas and S. Sonkusale, “A 0.8 V asynchronous ADC for energy constrained sensing
applications,” in Digest IEEE Custom Integr. Circuits Conf., pp. 173–176, 2008.
[13] Y. Tsividis, “Digital signal processing in continuous time,” in IEEE ICASSP, pp. 596–592,
2004.
269
[14] Y. Li, K. Shepard, and Y. Tsividis, “A continuous-time programmable digital FIR filter,”
IEEE Journal of Solid-State Circuits, vol. 41, no. 11, pp. 2512-2520, Nov. 2006.
[15] B. Schell and Y. Tsividis, “A continuous-time ADC/DSP/DAC system with no clock and
with activity-dependent power dissipation,” IEEE Journal of Solid-State Circuits, vol. 42, pp.
2472-2481, Nov. 2008.
[16] Z. Zhao, V. Smolyakov, and A. Prodic, “Continuous-time digital signal processing based
controller for high-frequency dc-dc converters,” in Proc. IEEE Appl. Power Electron. Conf,
pp. 882–886, 2007.
[17] B. Schell, Continuous-Time Digital Signal Processing: Analysis and Implementation. Ph.D.
dissertation, Dept. of Electrical Engineering, Columbia Univ., New York, 2008.
[18] H. Pan and A. Abiti, “Spectral spurs due to quantization in Nyquist ADCs,” in IEEE Trans.
on Circuits and Systems, pp. 1422–1439, Aug. 2004.
[19] W. R. Bennett, “Spectra of quantized signals,” Bell Syst. Tech. J., vol. 27, pp 446-472, 1948.
[20] M. Kurchuk and Y. Tsividis, “Signal-dependent variable-resolution quantization for
continuous-time digital signal processing,” in Proc. IEEE Int. Symp. on Circuits and Systems,
pp. 1109–1112, 2009.
[21] M. Kurchuk and Y. Tsividis, “Signal-dependent variable-resolution clockless a/d conversion
with application to continuous-time digital signal processing,” IEEE Trans. on Circuits and
Systems, vol. 57, no. 5, pp. 982-991, 2010.
[22] S. Haykin, Communication Theory. John Wiley and Sons, New York, 2001.
[23] R. Tomovic and G. Berkey, “Adaptive sampling based on amplitude sensitivity,” IEEE Trans.
on Automatic Control, vol. 11, no. 2, pp 282-284, April 1966.
[24] K. Guan, S. Kozat, and A. Singer, “Adaptive reference levels in a level-crossing analog-to-
digital converter,” EURASIP Journal on Advances in Signal Processing, vol. 2008, article ID
513706, 2008.
[25] Y. L. Wong, M. Cohen, and P. Abshire, “A 750-MHz 6-b adaptive floating-gate quantizer in
0.35-µm CMOS,” IEEE Trans. on Circuits and Systems, vol. 56, no. 7, pp. 1301-1312, July
2009.
[26] T. Wang, D. Wang, P. Hurst, B. Levy, and S. Lewis, “A level-crossing analog-to-digital con-
verter with triangular dither,” IEEE Trans. on Circuits and Systems, vol. 56, no. 9, pp. 2089-
2099, 2009.
[27] C. Vezyrtzis and Y. Tsividis, “Processing of signals using level-crossing sampling,” in Proc.
IEEE Int. Symp. on Circuits and Systems, pp. 2293–2296, 2009.
[28] M. Miskowicz, “Efficiency af event-based sampling according to error energy criterion,” Sen-
sors, vol. 10, pp. 2242-2261, 2010.
270
[29] M. Miskowicz, “Send-on-delta concept: an event-based data reporting strategy,” vol. 6, pp.
49-63, 2006.
[30] E. Kofman, “Quantized-state control. A method for discrete event control for continuous
systems,” Latin American Applied Research Journal, vol. 33, no. 4, pp.399-406, 2003.
[31] D. Ciscato and L. Mariani, “On increasing sampling efficiency by adaptive sampling,” IEEE
Trans. Autom. Control, vol. AC-12, no. 3, p. 318, Jun. 1967.
[32] M. Miskowicz, “Asymptotic effectiveness of the event-based sampling according to the inte-
gral criterion,” Sensors, vol. 7, pp. 16-37, 2007.
[33] M. Kurchuk and Y. Tsividis, “Energy-efficient asynchronous delay element with wide con-
trollability,” in Proc. IEEE Int. Symp. on Circuits and Systems, pp. 3837–3849, 2010.
[34] Y. S. Suh, “Send-on-delta sensor data transmission with a linear predictor,” Sensors, vol. 7,
pp. 537-547, 2007.
[35] D. Kalogiros and V. Stylianakis, “A theoretical model for time code modulation,” Proc. IEEE
ICASSP, vol. 57, pp. 2439-2442, 1999.
[36] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, W. Dunford, and P. Palmer, “A 4
GHz non-resonant clock driver with inductor-assisted energy return to power grid,” IEEE
Trans. on Circuits and Systems, vol. 57, no. 8, pp. 2899-2999, 2010.
[37] T. Toifl, C. Menolfi, P. Buchmann, M. Kossel, T. Morf, and M. Schmatz, “A 1.25-5 GHz
clock generator with high-bandwidth supply-rejection using a regulated-replica regularo in
45-nm cmos,” IEEE Journal of Solid-State Circuits, vol. 44, no. 11, pp. 2901-2910, 2009.
[38] E. Roza, Analog FIR-filter with SIGMA DELTA modulator and delay line. US Patent
6795001, 2002.
[39] Y. A. Eken and J. Uyemura, “A 5.9-GHz voltage-controlled ring oscillator in 0.18-µm
CMOS,” IEEE Journal of Solid-State Circuits, vol. 39, no. 1, pp. 230-233, 2004.
[40] M. G. Johnson and E. I. Hudson, “A variable delay line PLL for CPU coprocessor synchro-
nization,” IEEE Journal of Solid-State Circuits, vol. 23, no. 5, pp. 1218-1223, 1988.
[41] M. Saint-Laurent and G. P. Muyshondt, “A digitally controlled oscillator constructed using
adjustable resistors,” in Proc. Southwestern Symp. Mixed-Signal Design, pp. 90–82, 2001.
[42] T. Wanatabe, T. Mizuno, and Y. Makino, “An all-digital analog-to-digital converter with 12-
µv/LSB using moving-average filtering,” IEEE Journal of Solid-State Circuits, vol. 38, no. 1,
pp. 120-125, 2003.
[43] H. Farkhani, M. Meymandi-Njad, and M. Sachdev, “A fully digital ADC using a new de-
lay element with enhanced linearity,” in Proc. IEEE Int. Symp. on Circuits and Systems,
pp. 2406–2409, 2008.
271
[44] J. Kim, “Area-efficient digitally controlled cmos feedback delay element with programmable
duty cycle,” IEICE Electronics Express, vol. 6, no. 4, pp. 193-197, 2009.
[45] N. R. Mahapatra, A. Tareen, and S. V. Garimella, “Comparison and analysis of delay ele-
ments,” in Proc. Midwest Symp. on Circ. and Syst., pp. 473–476, 2002.
[46] G. Kim, M. K. Kim, B. S. Chang, and W. Kim, “A low voltage, low-power cmos delay
element,” IEEE Journal of Solid-State Circuits, vol. 31, no. 7, pp. 966-971, 1996.
[47] B. Schell and Y. Tsividis, “A low power tunable delay element suitable for asynchronous
delay of burst information,” IEEE Journal of Solid-State Circuits, vol. 43, no. 5, pp. 1227-
1234, 2008.
[48] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Jitter and phase noise in ring oscillators,” IEEE
Journal of Solid-State Circuits, vol. 34, no. 6, pp. 790-804, 1999.
[49] B. Leung, “A switching-based phase noise model for CMOS ring oscillators based on multi-
ple thresholds crossing,” IEEE Trans. on Circuits and Systems, vol. 57, no. 11, pp. 2858-2869,
2010.
[50] J. McNeill, “Jitter in ring oscillators,” in Proc. IEEE Int. Symp. on Circuits and Systems,
vol. 6, 1994.
[51] B. Kim, T. C. Weigandt, and P. R. Gray, “PLL/DLL system noise analysis for low jitter clock
synthesis design,” in Proc. IEEE Int. Symp. on Circ. and Syst., vol. 4, 1994.
[52] M. Figueiredo and R. Aguiar, “Predicting noise and jitter in CMOS inverters,” in Proc. Re-
search in Microelectronics and Electronics Conference, pp. 21–24, 2007.
[53] A. Strak and H. Tenhunen, “Analysis of timing jitter in inverters induced by power-supply
noise,” in Proc. Design and Test of Integrated Systems in Nanoscale Technology Conference,
p. 53, 2006.
[54] A. I. Kayssi, K. A. Sakallah, and T. M. Burks, “Analytical transient response of CMOS
inverters,” IEEE Trans. on Circuits and Systems, vol. 39, no. 1, pp. 42-45, 1992.
[55] G. F. Lawler, Introduction to Stochastic Processes. Chapman Hall, second ed., 2006.
[56] I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus. Springer-Verlag,
second ed., 1991.
[57] N. A. Weiss, A course in probability. Addison-Wesley, 2006.
[58] A. Agrawal, S. Mathew, S. Hsu, M. Anders, H. Kaul, F. Sheikh, R. Ramanarayanan, and
S. Srinivasan, “A 320mV-to-1.2 on-die fine-grained reconfigurable fabric for DSP/media ac-
celerators in 32nm CMOS,” in isscc, pp. 328–329, 2010.
272
