Digital Signal Processing with Signal-Derived Timing: Analysis and Implementation by Chen, Yu
Digital Signal Processing with
Signal-Derived Timing: Analysis and
Implementation
Yu Chen
Submitted in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy







Digital Signal Processing with
Signal-Derived Timing: Analysis and
Implementation
Yu Chen
This work investigates two different digital signal processing (DSP) approaches that rely on
signal-derived timing: continuous-time (CT) DSP and variable-rate DSP. Both approaches enable
designs of energy-efficient signal processing systems by relating their operation rates to the input
activity.
The majority of this thesis focuses on CT-DSP, whose operations are completely digital in CT,
without the use of a clock. The spectral features of CT digital signals are analyzed first, demon-
strating a general pattern of the quantization noise spectrum added in CT amplitude quantization.
Then the focus is narrowed to the investigations of the system characteristics and architecture of
CT digital infinite-impulse-response (IIR) filters, which are barely studied in the previous work on
this topic. This thesis discusses and addresses previously unreported stability issue in CT digital
IIR filters with the presence of delay-line mismatches and proposes an innovative method to design
high-order CT digital IIR filters with only two tap delays. Introducing an event detector allows the
operation rate of a CT digital IIR filter to closely track the input activity even though it is a feed-
back system. For the first time, the filtered CT digital signal is converted to a synchronous digital
signal. This facilitates integrating the CT digital filter and conventional discrete-time systems and
expands the applications of the former. This discussion uses a computationally efficient interpo-
lation filter to improve the signal accuracy of the synchronous digital output. On the circuit level,
a new delay-cell design is introduced. It ensures low jitter, good matching, robust communication
with adjacent circuits and event-independent delay.
An integrated circuit (IC) with all these ideas adopted was fabricated in a TSMC 65 nm LP
CMOS process. It is the first IC implementation of a CT digital IIR filter. It can process signals
with a data rate up to 20 MHz. Thanks to the IIR response and the 16-bit resolution used in the
system, the implemented filter can achieve a frequency response much more versatile and accurate
than the CT digital filters in prior art. The implemented system features an agile power adaptive
to input activity, varying from 2.32 mW (full activity) to 40 µW (idle) with no power-management
circuitry.
The second part of the thesis discusses a variable-rate DSP capable of processing samples with
a variable sampling rate. The clock rate in the variable-rate DSP tracks the input sampling rate.
Compared to a fixed-rate DSP, the proposed system has a lower output data rate and hence is
more computationally efficient. A reconstruction filter with a variable cutoff frequency is used to
reconstruct the output. The signal-to-noise ratio remains fixed when the sampling rate changes.
Contents
List of Figures vi
List of Tables xi
1 Introduction 1
1.1 DSPs with signal-derived timing . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Prior art and contributions of this work . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Spectral Analysis of the CT-ADC and -DSP 13
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 CT amplitude quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Quantization characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Single-tone input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.3 Two-tone input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.4 Band-limited Gaussian inputs . . . . . . . . . . . . . . . . . . . . . . . . 27
i
2.3 Uniform sampling of quantized CT signals . . . . . . . . . . . . . . . . . . . . . . 29
2.3.1 Uniform sampling context and issues . . . . . . . . . . . . . . . . . . . . 29
2.3.2 Analysis of aliased quantization error – a staircase model . . . . . . . . . . 30
2.4 CT-ADC with time coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5 Error analysis of a CT-DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 System-Level Considerations of CT Digital IIR Filter Systems 37
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Preventing instability in CT digital IIR filters . . . . . . . . . . . . . . . . . . . . 38
3.2.1 Discussion of the previous CT digital IIR filter design . . . . . . . . . . . 42
3.3 Digital signal processor with signal-derived timing . . . . . . . . . . . . . . . . . 44
3.3.1 Event detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Interpolation filter for synchronization . . . . . . . . . . . . . . . . . . . . . . . . 47
4 Design of Tunable Digital Delay Cells 49
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Prior art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 Proposed delay cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.1 Eliminating energy waste . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.2 Ensuring robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.3 Sizing the current source for good matching . . . . . . . . . . . . . . . . . 56
4.3.4 Identifying, and eliminating, signal-dependent delay . . . . . . . . . . . . 58
ii
4.4 Measurement results of delay cells . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5 Implementation of a CT Digital IIR Filter System 64
5.1 Overall system architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2 Architecture of the CT digital IIR filter . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.1 Timing block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.2 Data path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2.3 Event detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.4 Delay cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.5 Half-delay cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.6 Grouping block in the CT digital IIR filter . . . . . . . . . . . . . . . . . . 73
5.2.7 Asynchronous FIFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2.8 Arithmetic blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 Interpolation filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3.1 Grouping block in the interpolation filter . . . . . . . . . . . . . . . . . . 78
5.4 CT-to-DT converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6 Measurement Results of the CT Digital IIR Filter System 81
6.1 Description of the measurement setup . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2 System configuration and calibration . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2.1 Calibration of delay lines . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
iii
6.2.2 Timing-dependent delay . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3 Frequency response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.3.1 Two-tone test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3.2 Effect of the interpolation filter . . . . . . . . . . . . . . . . . . . . . . . . 94
6.4 Power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.4.1 Test with a speech signal . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.5 Performance summary and comparison to other work . . . . . . . . . . . . . . . . 98
7 Design Considerations for VR Digital Signal Processing 102
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.2 VR-DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.2.1 Constant-sampling-rate regions . . . . . . . . . . . . . . . . . . . . . . . 105
7.2.2 Sampling-rate-transition regions . . . . . . . . . . . . . . . . . . . . . . . 106
7.2.3 An illustrative example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.2.4 Remarks on required rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.3 Reconstruction of VR samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8 Suggestions for Future Work 117
8.1 Improvement to event detection in the CT digital IIR filter . . . . . . . . . . . . . 117
8.2 A CT digital IIR filter with one tap delay . . . . . . . . . . . . . . . . . . . . . . . 118
8.2.1 Implementation of two-bit TD delay line . . . . . . . . . . . . . . . . . . . 121
iv
8.2.2 A two-bit TD delay line as the CT digital IIR filter’s timing block . . . . . . 126
8.3 A complete signal processing chain with a VR sigma–delta modulator and a VR-DSP126
Bibliography 129
Appendix 137
A Stability Analysis of CT Digital IIR Filter in the Laplace Domain 138
v
List of Figures
1.1 An example of a DT-ADC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 An example of a CT-ADC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 An example of a VR-ADC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Block diagrams of various systems. . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 The structure of a CT delay block. . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 A CT amplitude quantization operation. . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Quantization characteristics and error functions. . . . . . . . . . . . . . . . . . . . 17
2.3 A quantized sinusoid waveform and its quantization error. . . . . . . . . . . . . . . 19
2.4 Power spectrum of the quantization error in Fig. 2.3(c) relative to signal power. . . 20
2.5 Power spectrum of the quantization error in Fig. 2.3(c) relative to signal power. . . 24
2.6 Quantization error relative to signal power in the two-tone test. . . . . . . . . . . . 26
2.7 Power spectra in the band-limited Gaussian input case. . . . . . . . . . . . . . . . 28
2.8 Staircase-modeled error spectrum before aliasing. . . . . . . . . . . . . . . . . . . 31
2.9 Theoretical asymptotes and simulation results of SERdB using the staircase model. . 34
vi
2.10 A CT signal processing system composed of a CT-ADC and a CT-DSP. . . . . . . 35
3.1 An ideal second-order IIR filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 A second-order IIR filter with tap delay mismatch. . . . . . . . . . . . . . . . . . 39
3.3 A second-order IIR with event grouping. . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 Input and output waveforms of a second-order IIR filter with mismatched delay lines. 42
3.5 A sixth-order CT IIR implemented with cascaded second-order sections. . . . . . . 45
3.6 A sixth-order CT IIR with an event detector. . . . . . . . . . . . . . . . . . . . . . 46
3.7 Interpolation filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1 Delay cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Baseline delay cell design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Delay standard deviation over mean. . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4 Charge distributions of nMOS transistor current source (M2 in Fig. 4.3) when it is
fully on (left) and fully off (right). . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.5 Schematic of the modified delay cell. . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.6 Die photo (left) and layout of one delay cell (right). . . . . . . . . . . . . . . . . . 61
4.7 Measured delay-cell responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1 Top-level block diagram of the CT digital IIR filter system. . . . . . . . . . . . . . 65
5.2 The input-derived timing block. . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3 Architecture of the data path in a pipeline. . . . . . . . . . . . . . . . . . . . . . . 69
5.4 Architecture of the sixth-order CT digital IIR filter with shared delay lines. . . . . . 71
vii
5.5 Event detector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.6 Circuit implementation of the grouping block in IIR filter. . . . . . . . . . . . . . . 74
5.7 Architecture of asynchronous FIFO with one write and two read channels. . . . . . 76
5.8 Timing diagrams of write and read operations for the asynchronous FIFO. . . . . . 76
5.9 Architecture of one FIR section in interpolation filter. . . . . . . . . . . . . . . . . 78
5.10 Circuit implementation of the grouping block in interpolation filter. . . . . . . . . . 79
5.11 Architecture of the CT-to-DT converter. . . . . . . . . . . . . . . . . . . . . . . . 80
6.1 Die photo of the CT digital IIR filter system. . . . . . . . . . . . . . . . . . . . . . 82
6.2 Measurement setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3 PCB boards for test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.4 DT to CT conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.5 Block diagram of the chip under test. . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.6 Example setup of automatic tuning of the delay lines. . . . . . . . . . . . . . . . . 87
6.7 Configuration for delay-line calibration. . . . . . . . . . . . . . . . . . . . . . . . 88
6.8 Delay-line measurements after tuning. . . . . . . . . . . . . . . . . . . . . . . . . 89
6.9 Event delay versus event index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.10 Comparison of the CT and DT waveforms at the interpolation filter and the CT-to-
DT converter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.11 Frequency responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.12 Out-band two-tone test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.13 Spectra at input and output of the interpolating filter. . . . . . . . . . . . . . . . . 96
viii
6.14 Power consumption versus input event rate. . . . . . . . . . . . . . . . . . . . . . 97
6.15 Plot of input speech signal (top) and the instantaneous power consumption (bot-
tom) of the chip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.1 A VR-ADC followed by a conventional DSP. . . . . . . . . . . . . . . . . . . . . 104
7.2 Variations on a VR-DSP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.3 Sample input and output signals around frequency transitions. . . . . . . . . . . . 110
7.4 DSP frequency response, noise power spectral density, and reconstruction filter
frequency response, for low and high sampling rates. . . . . . . . . . . . . . . . . 112
7.5 Reconstruction using sinc with a variable cutoff frequency at three different instants.113
7.6 Comparison of two reconstruction methods. . . . . . . . . . . . . . . . . . . . . . 115
7.7 Comparison of the reconstructed and the ideal output. . . . . . . . . . . . . . . . . 115
8.1 A possible implementation of event detection with no error power. . . . . . . . . . 118
8.2 A shared timing block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.3 A two-bit TD delay line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
8.4 A two-bit delay line composed of several delay cells for fine time granularity. . . . 122
8.5 Grouping block in a two-bit delay line. . . . . . . . . . . . . . . . . . . . . . . . . 125
8.6 Schematic of the grouping block in Fig. 8.5(a). . . . . . . . . . . . . . . . . . . . 125
8.7 A two-bit TD delay line is configured as the CT digital IIR filter’s timing block. . . 126
8.8 Two signal-processing chains compared. . . . . . . . . . . . . . . . . . . . . . . . 127
A.1 Second-order IIR filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
ix
A.2 Max modulus of e
sTD
100 versus delay mismatch. . . . . . . . . . . . . . . . . . . . . 141
x
List of Tables
4.1 Devices sizes of the delay cell in Fig. 4.2. . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Comparison of different delay cells. . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Jitter of delay cell cascades. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.1 Power breakdown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.2 Summary of the chip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3 Comparison of presented work with state-of-art CT and DT digital filters. . . . . . 101
7.1 In-band SNR comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
xi
Acknowledgments
I would like to thank Prof. Yannis Tsividis, my advisor, for inspiring my interest in integrated
circuit design, for his patient guidance of my research, and for always answering my questions in a
timely manner. The past six years was a tough journey. But I never felt hopeless, because he could
always point out to me a way out of the difficulties.
I would like to thank my thesis committee members: Profs. Rajit Manohar, Thao Nguyen,
Mingoo Seok and John Wright. Special thanks to Prof. Rajit Manohar, for his guidance on asyn-
chronous digital circuit design and his amazing tool for design verification. Also to Prof. Thao
Nguyen, for offering a wonderful Advanced Digital Signal Processing course, which inspired my
interest in this topic.
A lot of other people have to be especially mentioned, because things would have been very
different without them. I want to thank Prof. Steven Nowick for teaching me the basics of asyn-
chronous digital circuit design and Prof. Kinget for offering great circuit design classes and for his
helpful discussions of my circuits. I want to thank my colleague, Sharvil Patil, for the countless
discussions we had and for the joy that resulted from our collaboration over the past few years. I
want to thank Jianxun Zhu for teaching me numerous things, e.g. PCB design, soldering, testing,
etc., and Yang Xu for his insightful discussions. I want to thank Kevin Tien for answering my CAD
tool questions and for other enjoyable conversations. I also want to thank Rabia Yazicigil, Ning
Guo, Josh Kim, Jayanth Kn, Baradwaj Vigraham, Christos Vezyrtzis, Chun-wei Hsu and Karthik
Tripurari for their helpful discussions and assistance.
xii
Finally, I want to thank my family. I thank my parents; their support kept me going. I thank my




Digital signal processors (DSPs1) are widely used in modern electronic systems because they are
highly programmable and robust to noise. Conventional DSPs process discrete-time (DT) digital
samples generated by a DT analog-to-digital converter (ADC2). As shown in Fig. 1.1, A DT-ADC
samples an analog input with a uniform-rate clock and then quantizes the samples to discrete val-
ues. To accommodate the worst-case scenario, the sampling clock frequency must be at least twice
the input’s maximum possible bandwidth. The clock frequency is fixed regardless of the actual
input activity. This implies that both conventional DSPs and DT-ADCs must always operate on a
clock rate chosen for the worst case. On the other hand, many real-world signals have time-varying
activities. For example, speech signals can contain frequency components as high as 10 kHz, but
the signal power is usually concentrated in a narrower band of around 3.5 kHz [1]. Biosignals, such
as those found in electrocardiography (ECG), are sporadic signals containing short intervals with
1“DSP” is used as an abbreviation for both “digital signal processor” and “digital signal processing.”
















Figure 1.1: An example of a DT-ADC. (a) Block diagram. (b) Operation of the CT-ADC. qi are
quantization levels.
high activity and long periods of silence. Sampling and processing these signals with a uniform
worst-case clock rate is unnecessarily energy inefficient.
Two approaches have been implemented to improve the signal-tracking property of the data
conversion: continuous-time (CT) ADC [2, 3, 6] and variable-rate (VR) ADC [1, 8–11]. A CT-
ADC converts an analog input into a set of CT binary waveforms without sampling. One common
CT-ADC is level-crossing sampling (LCS) [3, 12–14]. This scheme can be modeled as an ideal
quantizer (Fig. 1.2(a)). Whenever the input x(t) crosses one of the quantization levels qi, the am-
plitude of the quantized signal xq(t) changes. Because the amplitude of the quantized signal has

















Figure 1.2: An example of a CT-ADC. (a) The block diagram. (b) Operation of the CT-ADC. qi
are quantization levels.
1.2(b)). The time instants at which the amplitude changes are recorded by the pulses on the req
(request) bit. Req is effectively signal-derived timing, which asks the following stage (e.g., a DSP)
to process the data. The combination of the timing req and the amplitude value data forms a CT
digital signal. We call an update in the data value of a CT digital signal an event. Notice from
Fig. 1.2(b) that the event rate tracks the input. When x(t) varies quickly, it crosses quantization
levels more frequently, which generates a high event rate. When x(t) varies slowly, the event rate
decreases accordingly. In the extreme case where x(t) is almost quiescent, as shown on the right of
1.2(b), no event is generated for a long time. The nonuniform interevent intervals in a CT digital
signal are key to the signal representation.
A VR-ADC converts an analog input into a DT digital output, but with a nonuniform sampling
4
rate. Similar to a conventional DT-ADC, it is composed of a sampler followed by a quantizer (Fig.
1.3(a)). The difference between the two lies in the sampling clock. In the conventional DT-ADC
(Fig. 1.1), the clock has a fixed sampling rate. By contrast, the VR-ADC clock’s sampling rate is
variable. Its exact value is determined by the activity detector. The detector takes both x(t) and
a base clock as its inputs and derives a VR clock according to the input activity. The base clock
has a fixed rate which is always the highest in the system. Its operations are shown in Fig. 1.3(b).
When x(t) varies quickly, the VR clock operates at its highest rate, the same as the base clock, and
the DT digital samples are generated with the highest rate. When x(t) varies slowly, the VR clock
slows to an integer fraction of the base clock, which results in lower sampling rate. The VR clock
is a signal-derived timing.
Both CT-ADCs and VR-ADCs generate events/samples whose rate tracks the input activity.
In a CT-ADC, an event is generated at the level-crossing and hence is free of quantization error
at those time instants. Its signal-derived timing, req, is not necessarily aligned to any timing grid,
which results in an aliasing-free output. In addition, its event rate can go as low as zero if there is
no activity in the input at all. In contrast, the VR clock is derived both from the input and a base
clock. Samples are quantized in both the amplitude and the time domains and hence suffer from
quantization error and aliasing. In addition, even when the input has no activity, the VR clock’s
lowest rate is not usually zero.
We use the term “event” in our discussions of CT digital signals. A CT digital signal is defined
in the entire time domain; “events,” which are defined as the updates of the signal value, are only




















Figure 1.3: An example of a VR-ADC. (a) The block diagram. (b) Operation of the VR-ADC. qi
are quantization levels.
6
which the signal value doesn’t change. By contrast, we use the term “sample” in the discussions
of DT digital signals. A DT digital signal contains meaningful information only at the discrete
sampling instants when samples are generated.
Despite their differences, both data converters offer the promise of energy-efficient systems
by relating the event/sample rate to the input activity. This work discusses approaches to design
energy-efficient DSPs in both schemes.
1.1 DSPs with signal-derived timing
Fig. 1.4(a) shows the structure of a conventional DT-DSP. It is composed of arithmetic blocks (i.e.,
the multipliers and the adders) and a tapped delay line. Because the preceding DT-ADC generates
digital samples with a uniform rate and the DT-DSP operates under the same clock rate, the tapped
delay line can simply be built with a chain of D flip flops (DFFs). Each time the delay line shifts
the digital samples down by one position, it effectively implements a delay of one sample interval
(i.e., one clock cycle). However, this delay line doesn’t work for CT digital signals from CT-ADCs
or VR DT digital signals from VR-ADCs because their event/sample intervals are nonuniform.
New hardware is needed to handle these CT and VR DT digital signals.
The DSP processing a CT digital signal is called a CT-DSP (Fig. 1.4(b)). It is similar to a DT-
DSP except for the delay line. Because the events can occur at any time in a CT digital signal, not
aligned to any uniform timing grid, it must rely on the CT delay blocks to implement tap delays.
Both req and data are delayed by the CT delay blocks. The original signal-derived timing req



















































Figure 1.4: Block diagrams of various systems. (a) DT-ADC/DT-DSP. (b) CT-ADC/CT-DSP. (c)
VR-ADC/VR-DSP.
8
tap. Because reqi, i = 1,2, . . . ,K, are delayed versions of req, their activities also track the input,
as do the operations in the CT-DSP.
The signals at the internal nodes and the output of a CT-DSP are also CT digital. An event at
these nodes also means an update of their data values. The completion of the update is indicated
by a rising edge on the timing bit req. For example, in Fig. 1.4(b), when an input event arrives,
it triggers an update at the output of the CT-DSP to reflect the data change at the input. An event
is generated there. Note that an event at the output of the CT-DSP can result in a zero change in
dataout. It is possible that two input events arrive at two different taps of a CT-DSP filter at the
same time, but their amplitude changes happen to cancel each other out. For example, when these
two input events have the same amount of amplitude change, but see +1 and −1 tap coefficients,
respectively. They trigger a new event at the CT-DSP output, but the value of dataout is unchanged.
Fig. 1.5 shows one possible implementation of a CT delay block. It is composed of a one-bit
CT delay line delaying req by TD, and an asynchronous FIFO for temporary data storage [15]. The
data of a CT digital signal is stored into the FIFO through the write terminals (W) when it first
enters the delay block. TD later, the pulses on the timing signals arrive at the end of the one-bit CT
delay line – i.e., reqi+1. They read out data through the read terminals (R). As a result, the block
implements a TD delay between datai+1 and datai.
The minimum event interval, termed the granularity, is determined by the accompanying CT-
ADC. If the CT-ADC is a level-crossing sampler, the granularity is the minimum interval between
the instants when an input crosses two successive quantization levels. Normally, it is much smaller


















Figure 1.5: The structure of a CT delay block.
close intervals at the same time. In order to properly handle this crowded traffic, the one-bit CT
delay line is built with a chain of smaller-delay cells, tg. The value of tg must be no larger than the
granularity of the input CT digital signal.
A CT-DSP built with the CT delay blocks in Fig. 1.5 can process CT digital signals from
various types of analog-to-digital conversions, e.g., pulse-width modulation [4], delta modulation
[3], sigma–delta modulation, LCS, etc. In this thesis, we will use LCS as an example, because of
its signal-tracking feature. It is a known problem that the output of a LCS can have a high event
rate when the analog input moves quickly. Recently developed techniques [5–7] can greatly reduce
the event rate without compromising conversion accuracy.
The DSP processing a VR DT digital signal is called a VR-DSP (Fig. 1.4(c)). Although the
samples from a VR-ADC have a nonuniform rate, they are aligned to a timing grid defined by the
highest-rate base clock. Hence, one can still implement the delay line with a clock and conventional
10
digital circuits (e.g., DFFs). The challenge in designing an energy-efficient VR-DSP is that its
operations should track the input activity. So the operations in a VR-DSP have to rely on the VR
clock, rather than the base clock. The key of such design is to adaptively configure the delay lines
so that the tap delay is independent of the instantaneous rate of the VR clock. More details about
the design of a VR-DSP are provided in Chapter 7.
1.2 Prior art and contributions of this work
The first successful integrated-circuit (IC) implementation of the CT-DSP was published by Schell
et al. in [3]: a 16th-order FIR filter for voice applications. [16] provided extensive theoretical anal-
ysis for CT digital signal processing, and [17] improved the frequency capability by five orders of
magnitude and implemented a CT-DSP for GHz-range applications. However, both [3] and [17]
relied on specific coding schemes: delta modulation in [3] and per-edge signal encoding in [17].
This limits the applicability of their designs. [15] implemented a CT-DSP with an open input for-
mat. It can accept different types of modulations, both synchronous and asynchronous. For the
first time, this design offered a solution to delay a multiple-bit CT digital data in an energy- and
hardware-efficient way. It implemented a kHz-range CT-DSP whose frequency response remained
intact while processing inputs with different sampling rates.
All the previous attempts on CT-DSPs only resulted in FIR filters. The only work toward a
CT IIR filter suffered from high noise and large discrepancies between the measured and designed
frequency responses [18]. In addition, the filter order in all previous CT-DSPs is limited by the fact
that a large number of power-hungry, area-consuming delay lines, proportional to the filter order, is
11
needed for their implementation. Finally, their outputs are nonsynchronous and hence incompatible
with DT systems.
The work presented in this thesis solves all of these problems. The contribution of this work
include
• Spectral analysis of the CT-ADC and -DSP.
• Integrated-circuit implementation of the first CT digital IIR filter.
• A design method using a small, fixed number of delay taps to design high-order CT digital
IIR filters.
• A CT-to-DT converter facilitating integration of the CT digital filter with conventional DT
systems.
• Analysis of the CT digital IIR filter stability issue in the presence of delay-line mismatches.
A practical solution is provided to prevent instability in CT digital IIR filters.
• An event-detection technique that allows a closed-loop CT digital IIR filter’s power con-
sumption to track the input activity.
• A VR-DSP solution that can process DT digital signals with a variable sampling rate. The
VR-DSP has a frequency response independent of the instantaneous sampling rate, and has
a power consumption tracking the input activity.
12
1.3 Thesis outline
This thesis discusses the design of both CT-DSPs and VR-DSPs, with the majority of the content
(Chapters 2–6) focusing on the former, while Chapter 7 focuses on the latter.
Chapter 2 analyzes the spectral feature of CT-ADC and DSP. It analyzes the error introduced
in CT amplitude quantization. Spectral features of the quantization error added on band-limited
signals are highlighted. The effects of synchronization and time coding on the error spectrum are
discussed. Finally, the chapter analyzes the error at the output of a CT-DSP.
Chapters 3–6 discuss the implementation of a CT digital IIR filter system. Chapter 3 describes
the design’s system-level considerations. Chapter 4 discusses the design of digital delay cells.
Chapter 5 gives the circuit implementations of the various blocks. Chapter 6 presents measurement
results.
Chapter 7 discusses the design considerations of the VR-DSP. An illustrative example is pro-
vided. A reconstruction filter with a variable cutoff frequency is also introduced for output recon-
struction.
Chapter 8 offers some suggestions for future work.
Chapter 2
Spectral Analysis of the CT-ADC and -DSP
This chapter analyzes the error introduced in CT amplitude quantization and highlights spectral
features of the quantization error added to band-limited signals. The chapter also discusses the
effects of synchronization and time coding on the error spectrum and analyzes the error at the
output of a CT-DSP. This chapter is based on [19].
2.1 Introduction
The traditional view that quantization amounts to an additive and independent source of white
noise has mainly resulted from the culture of DT signals; to produce a DT signal from a CT input,
both input sampling and quantization are performed. In contrast to this, in a purely CT analog-to-
digital converter (ADC) only quantization occurs [20] (Fig. 2.1). In this chapter, we show how CT







Figure 2.1: A CT amplitude quantization operation.
waveforms. This yields harmonic-based, rather than noise oriented, spectral results. The resulting
spectrum is discussed in Section 2.2.
While purely CT signal processing has advantages [20], one still needs to consider and evaluate
the effects of an eventual sampling of the signals thus obtained for interface compatibility with
standard data processing (including storage, transmission and off-line computation). Section 2.3
addresses this issue.
As mentioned in [20], one can store the nonuniformly spaced event times tk by finely quantizing
the time axis, which results in “pseudocontinuous” operation. A basic question when doing this is
how to choose the resolution of time quantization such that the accuracy of the resulting signal is
comparable with that of a purely CT system. Section 2.4 discusses this problem. Finally, the results
obtained for CT quantizers are extended to the output of a CT-DSP in Section 2.5.
2.2 CT amplitude quantization
CT amplitude quantization (Fig. 2.1) converts the input x(t) to a piecewise constant signal xq(t)
and allows for further digitizing. Basically, xq(t) is an instantaneous transformation of x(t), which
15
can be expressed as
xq(t) = Q(x(t)), (2.1)
where Q is the scalar function of quantization. The error signal e(t) = xq(t)− x(t) is then also a
memoryless function of x(t),
e(t) = E(x(t)), (2.2)
where E is the error function of Q defined by
E(x) = Q(x(t))− x(t). (2.3)
The above error signal, e(t), is defined with the implicit assumptions that it does not contain a
signal component or a constant, and the quantization does not add delay. In the more general case,
the error must be defined as follows. The mean square error (MSE) is obtained by a minimiza-








Once the three parameters are determined, the error signal can then be defined as
e(t) = xq(t)−ax(t− τ)−b. (2.5)
16
For simplicity, this chapter assumes that quantizers have a unity gain, zero DC shift and zero time
delay, so that e(t) = xq(t)− x(t). However, the principles presented are also valid for other cases.
After reviewing basic knowledge on the error function, E(x), of standard scalar quantizers, we
will tackle the spectral analysis of e(t) when x(t) is a single tone and then when it is composed of
two tones.
2.2.1 Quantization characteristics
Quantization consists of approximating an input number x by the nearest value from a predeter-
mined finite set of quantum levels. In general, these levels are not necessarily uniformly spaced, as
is the case in signal companding. In this chapter, we concentrate on the simpler but standard case
of uniform spacing. Fig. 2.2(a–b) shows the two typical transfer functions, Q, of uniform scalar
quantizers [21]. They are both staircase functions. The midrise quantizer of Fig. 2.2(a) has a dis-
continuity at x = 0. Meanwhile the midtread quantizer of Fig. 2.2(b) has a quantum level at x = 0
and is preferred in event-driven systems, as it can tolerate small input imperfections around zero
(DC offset, noise, etc.) and maintain a zero output in the absence of a signal. An odd number of
quantum levels is however necessary to achieve symmetry around the origin. With a quantization








whereas it is 2N−1∆ with the midrise quantizer.
























































Figure 2.2: Quantization characteristics and error functions. (a) Characteristic of midrise quantizer.
(b) Characteristic of midtread quantizer. (c) Quantization error of midrise quantizer. (d) Quantiza-
tion error of midtread quantizer.
tooth functions of period ∆. When x is a random variable that is uniformly distributed over a range
of a multiple of ∆ in length, E(x) is known to be uniformly distributed between −∆2 and
∆
2 [22]. In
this case, E(x) yields the classical variance value of ∆
2
12 .
When the above statistical assumption on the input is not satisfied, one must resort to a deter-
ministic analysis of E(x). As a periodic function, E(x) can be expanded in a Fourier series. With








Removing (−1)k from this equation yields the series corresponding to midrise quantizers. In the
18
following discussion however, we will focus on the midtread characteristic and choose by default
the error expansion of (2.6).
2.2.2 Single-tone input
We now analyze the quantization error signal when x(t) is a single-tone (sinusoidal) input. Fig. 2.3
shows both the quantized output xq(t) = Q(x(t)) and the quantization error signal e(t) = E(x(t))
for a six-bit midtread quantizer with a full-scale single-tone input. As illustrated in part (c) of this
figure, the analysis of e(t) can be divided into three different regions [25].
1. The bell-like pulses formed by the quantization of the sine wave around its peaks and troughs
2. The sawtooth patterns arising from the quantization of the sine wave around its zero-crossings,
where it is close to an ideal ramp
3. The transition parts between the above two regions
Although the amplitude values of a sinusoid are not uniformly distributed [26] (maximal density
values being obtained at the extreme values of the sinusoid, as expected from Fig. 2.3(c)), the prob-






[22]. In the following, “power,”
denoted by P, refers to a mean-square value. With a full-scale sinusoidal input, the total error power
of ∆
2
12 resulting from this model appears to be satisfactorily accurate in practice. Given the power
of a full-scale sinusoidal input of A
2
FS














Figure 2.3: A quantized sinusoid waveform and its quantization error. The quantizer has a six-bit
resolution. (a) A full-scale sinusoidal input; (b) the quantized output waveform (the piecewise-
constant waveform is more clearly shown in the zoomed-in plot.); (c) the quantization error signal.
It’s amplitude is normalized to the quantization step ∆.









where the approximation assumes 2N  1.
In the frequency domain, the bell-like pulses and the sawtooth waveform have very different







Figure 2.4: Power spectrum of the quantization error in Fig. 2.3(c) relative to signal power. The er-
ror signal is obtained by quantizing a full-scale sinusoidal signal with a six-bit midtread quantizer.
The x-axis shows the frequency normalized to fin and is plotted on a logarithmic scale.
Since e(t) is periodic with the same period as the input, its spectrum is composed of harmonics,
with the first harmonic at the input frequency. Since the quantization error function E(x) is odd-
symmetric around the origin and the input signal x(t) has so-called “half-wave symmetry,” the
corresponding error signal, e(t), also has “half-wave symmetry,” and thus the resulting discrete
spectrum contains only odd-order harmonics. The bell-like pulses are periodic in the time domain
at the input frequency and contribute mostly low-order harmonics. [27] observed that small varia-
tions in the amplitude of the input may cause large changes in the bell-like pulses. Thus, the power
of these low-order harmonics is very sensitive to the input amplitude. The fast-varying sawtooth
part contributes mostly to the high-frequency part of the error power. Let the minimum sawtooth
duration, TSAW, be the shortest time for the input to cross a quantization interval, and define the


























fin ≈ 2Nπ fin (2.9)
The last approximation assumes 2N  1, which is satisfied in most practical cases. In the given
example, N = 6. This leads to fSAW = 201 fin (indicated in Fig. 2.4). The spectrum beyond fSAW
contains the harmonics of the fundamental components below fSAW. A qualitative observation is
in order. In Fig. 2.4, one can observe that the amplitude of the components up to fSAW can roughly
be thought to have a constant envelope, while the amplitude of the components beyond fSAW can
be seen to drop off, on average, with a −20dB/decade slope. The reason for the value of this slope
will be discussed below.
Assume the single-tone input has a full-scale amplitude AFS, an arbitrary input frequency fin
and an arbitrary phase shift φ:
x(t) = AFS sin(2π fint +φ). (2.10)
22














A complete Fourier series expansion of e(t) can then be obtained using Bessel functions and
Jacobi-Anger expansion [24,28,29]. This process is involved and thus is not included in this chap-










)sin [m(2π fint +φ)],
where Jm(.) is the Bessel function of the first kind of order m. This is a sum of all the odd-order







. According to [28],












is negligible for m > 2Nπ. In other words, all the signifi-
cant tones of e1(t) have frequencies below 2Nπ fin, which is exactly the fSAW defined in equation
(2.9).












sin [m(2π fint +φ)].
23
This has the same form as e1(t) with a factor of 1/k in front of the summation and a factor of k
inside the argument of the Bessel function. The (−1)k in front does not affect the amplitude. Using
again (2.12), we find the significant tones of ek(t) are present only up to k fSAW. The frequency
range [(k−1) fSAW,k fSAW], which we call the kth fSAW band, contains the significant tones from
the kth and higher order terms only, those resulting from ∑∞i=k ei(t). However, because of the scaling
factor 1/k, the power of the tones within each kth fSAW is mostly determined by the term of the
lowest order ek(t). They are approximately k2 times lower than the tones within the first fSAW
band (i.e., [0, fSAW]). This result is better observed on the plot of Fig. 2.5, plotted using a linear
frequency axis. A staircase-like spectrum is observed. Each step has the same width, fSAW. Also,
the average power in each kth fSAW band is k2 lower than the tones in the first fSAW band. This
is consistent with the observed slope of −20dB/decade indicated on the logarithmic scale of Fig.
2.4. Fig. 2.5 presents zoomed-in plot around fin to more clearly show the discrete feature of the
spectrum. Only harmonics are present, while no component exists at any in-between frequency.
Power within [0, fSAW]
It was observed in [27] that the error power below fSAW dominates the entire quantization error
power. The reason is that the tones beyond fSAW come from the higher order terms of equation
(2.11) and thus their power is reduced by 1/k2. From (2.11), it can be found numerically that 71%
of the total error power lies in the band [0, fSAW]. It is also interesting to estimate this number using
the staircase model of Fig. 2.5. According to this rough model, the power spectral density in the





[3fSAW, 4fSAW] [4fSAW, 5fSAW]
(b)
Figure 2.5: Power spectrum of the quantization error in Fig. 2.3(c) relative to signal power. The er-
ror signal is obtained by quantizing a full-scale sinusoidal signal with a six-bit midtread quantizer.
The x-axis shows the frequency normalized to fin and is plotted on a linear scale. A zoomed-in plot
shows a narrow spectrum around fin (i.e., [0,20 fin]).
(with a distance of 2 fin), the error power within the kth fSAW band decreases in 1/k2 with k. The
ratio of power that lies in [0, fSAW] is then 1/∑∞k=1 k
2 = 1/π
2
6 ' 61%, comparable to the numerical
result above.
2.2.3 Two-tone input
Assume now that the input is a two-tone signal, x(t) = Ain1 sin(2π fin1t)+Ain2 sin(2π fin2t), assum-
ing zero phase between the two. The sawtooth frequency, fSAW, can still be derived from (2.7):
fSAW =




By defining the weighted-averaged frequency
fin,avg =
Ain1 fin1 +Ain2 fin2
AFS
,





which coincides with fSAW of a full-scale single-tone input of frequency fin,avg according to (2.8).
Fig. 2.6(a) plots the spectrum of the quantization error, e(t), resulting from this input in the case
where Ain1 = Ain2 =
AFS
2 and N = 6. The spectrum now contains not only harmonics, but also inter-
modulation products. The tones are uniformly spaced by a distance of ∆ f = fin2− fin1 in frequency.




= 201 fin,avg. This value is consistent with the
value of fSAW obtained from the graphical method by finding the transition frequency between the
flat and descending parts of the spectrum. Such agreement will also be observed with more com-
plicated inputs in Section 2.2.4. It can be found numerically that the power below the maximum
sawtooth frequency in this two-tone example is 78% of the total quantization-error power.
The total error power is about ∆
2





However, if a narrow baseband is considered, only a few harmonics and intermodulation compo-
nents fall in band and the resulting in-band SER can be much higher. Fig. 2.6(b) shows an example
baseband, [0,2 fin,avg]. A limited number of tones are evenly distributed within the baseband, with












































































Figure 2.6: Quantization error relative to signal power in the two-tone test. The quantizer has a
six-bit resolution. Full load of the quantizer is achieved by setting the amplitudes of the two input
tones as half of AFS. (a) A wide spectrum of the quantization error plotted on a logarithmic scale.
The frequency axis is normalized to fin,avg. (b) A narrow baseband spectrum of the quantization
error on linear scale. The frequency axis is normalized to fin,avg and shows the range from 0 to 2.
The locations of the two input tones are shown. (c) In-band quantization error power relative to
signal power as a function of the ratio of the upper band-edge frequency to fin,avg.
27
ative power of all these tones, which is 51 dB. The choice of baseband is rather arbitrary in this
two-tone case, where the input signal does not occupy a full band. We can choose a baseband with
almost any width and find the corresponding in-band SER. Fig. 2.6(c) shows the in-band quanti-
zation error power as a function of the ratio of baseband’s upper band-edge frequency to fin,avg.
As the band-edge moves to high frequencies, more and more harmonics and intermodulation tones
fall inside the baseband, increasing the resulting in-band error power. Although the curve is mono-
tonically increasing, its rate of change drops considerable once the band-edge is larger than fSAW.
This is because the error tones beyond fSAW contain little power. The horizontal line on the top
shows the total quantization error power relative to the signal power. It is the ultimate value of the
relative error power when the baseband is infinitely wide.
2.2.4 Band-limited Gaussian inputs
We now consider a more complex class of signals, namely band-limited signals with a Gaussian
amplitude probability density function. The probability of such a random input overloading the
quantizer is negligible (less than 1 in 10,000) as long as its root-mean-square amplitude, ARMS, is
less than a quarter of the quantizer’s full-scale range AFS [30,31]. The quantization-error spectrum
of this signal has features similar to one- and two-tone inputs: 1) It has a corner frequency, which
we denote by fSAW,Gaussian, and 2) beyond fSAW,Gaussian, the spectrum follows a −20dB/decade
slope. fSAW,Gaussian can be calculated by averaging the fSAW contributed by each input tone [27]:







Figure 2.7: Power spectra in the band-limited Gaussian input case. The root-mean-square ampli-
tude, ARMS, is set as one fourth of AFS. The quantizer has a six-bit resolution. The x-axis shows
the frequency normalized to fBW and is plotted on a logarithmic scale. The red dotted line is the
power spectrum of the input signal. It is generated by low-pass filtering a Gaussian white noise
with a cutoff frequency of 500 Hz. The solid line shows the power spectrum of the quantization
error. Both power values are normalized to Psignal, the total power of the input signal.
Calling fBW the maximum input frequency, we obtain fSAW,Gaussian = πARMS fBW/∆.
Fig. 2.7 shows an example spectrum of the quantization error (solid line). In contrast to the one-
and two-tone cases, the spectrum is continuous. ARMS is chosen to be AFS/4 and thus fSAW,Gaussian =
π2N−3 fBW. For N = 6, fSAW,Gaussian ≈ 25 fBW. This is consistent with the plot of the quantization
error. Beyond fSAW,Gaussian, a −20 dB/decade slope is apparent. Integrating the error power over
the entire frequency axis, we find the SER equals to 29 dB. In the same figure, the spectrum of the
band-limited Gaussian input is also plotted (dotted line). It’s power spectrum is white up to fBW.
29
Beyond that, its power drops significantly (due to filtering). In this band-limited Gaussian case, the
signal band is well defined as [0, fBW]. The bandwidth of the quantization error is much wider than
that of the input signal. Integrating the error power within the signal band, we find the in-band SER
to be 45 dB, which is 16 dB higher than the SER calculated over the entire frequency axis.
To summarize Section 2.2, fSAW plays an important role in the spectral characterization of any
CT quantizations of band-limited signals. The value of this quantity can be obtained from equation
(2.7). Within [0, fSAW], the error power spectral density is almost flat. Beyond fSAW, the spectral
density drops with a slope of −20 dB/decade. The frequency range [0, fSAW] contains the majority
of the total power of the quantization error [27, 32], roughly in the range of 70% to 80%.
2.3 Uniform sampling of quantized CT signals
2.3.1 Uniform sampling context and issues
While CT amplitude-quantized signals are free from aliasing, the step length in their piecewise-
constant waveforms varies, preventing their simple integration into standard systems operating
under a fixed clock rate. Resolving this issue requires a uniform sampling of the CT outputs. This
inevitably brings back the issue of aliasing. In fact, because CT quantization is an instantaneous
operation, permuting the order of amplitude quantization and time sampling gives the same re-
sult [33]. Indeed, the sample of the amplitude-quantized signal xq(t) = Q(x(t)) at an instant tk is
xq(tk) =Q(x(tk)), which is equal to the quantized version of the sample of x(t) at tk. This makes the
system equivalent to a conventional ADC involving sampling and quantization because the two op-
30
erations can be interchanged. In spite of this system equivalence, there are still two strong reasons
for using CT systems. 1) The major part of the hardware in a CT system is event-driven and thus
dissipates power more efficiently compared to clocked systems. 2) Using oversampling to improve
the SER is straightforward with CT systems. Only the sampling clock frequency of the output syn-
chronizer needs to be increased. In conventional systems on the other hand, oversampling requires
increasing the clock rate of the entire signal-processing chain.
2.3.2 Analysis of aliased quantization error – a staircase model
Next, we investigate how aliasing degrades signal accuracy when a CT-quantized signal is sam-
pled. It is usually believed that in-band quantization error can be arbitrarily reduced by increasing
sampling frequency. We will see that there is a limit to this error reduction.
Assume a band-limited input, x(t), of maximum frequency fBW is quantized and then uniformly






where es(t) is the CT quantization error signal and Ts = 1fs . We wish to evaluate the power of es(t)
that lies in the input baseband, [0, fBW]. Denoting by E( f ) the Fourier transform of e(t), we have




δ( f −n fs). (2.13)











2nd  fSAW band
Frequency
1st fSAW band









Figure 2.8: Staircase-modeled error spectrum before aliasing. The height represents the power
spectrum density and is normalized to the value of the first fSAW band. The dark parts represent
the bands, which will be shifted into the baseband after sampling. In this example, fs is assumed
to be a fraction of fSAW; thus, each fSAW band contains multiple dark parts.
around each harmonic of the sampling frequency fs [27]. This convolution operation is visually
explained in Fig. 2.8. As was explained in Section 2.2, |E( f )|2 along the frequency axis can be
reasonably modeled as a staircase function whose steps correspond to the fSAW bands of succes-
sive orders. The heights of the steps represent the power spectrum density and decrease as 1/k2.
The dark areas represent the frequency bands of width 2 fBW centered around the multiples of fs
(excluding the zero multiple). As (2.13) indicates, after e(t) is sampled at frequency fs, all the dark
parts are shifted into the baseband [− fBW, fBW] and added to its original power.
Assuming the aliased components are independent, we can estimate the total aliased error









where Pbb is the original error power within the baseband before aliasing and Nk is the number of
32
positive multiples of fs that fall in the kth fSAW band. The factor of two is due to the two-sided
spectrum: In signed frequency, the kth fSAW band includes the interval [(k−1) fSAW,k fSAW] and its
negative counterpart. The total quantization error power in es(t) is
Ptotal = Pal +Pbb. (2.15)
In (2.14), Pbb is solely dependent on the input and the quantizer. Meanwhile, Nk depends on the
ratio of fsfSAW .
When fsfSAW  1, the aliased error power Pal is dominated by the part coming from the first
fSAW band where power density is the highest. Thus, equation (2.15) can be reduced to
Ptotal ≈ 2N1Pbb +Pbb.
With N1 1 (which is true in most oversampling scenarios), it can be further reduced to
Ptotal ≈ 2N1Pbb.
The worst value of SERdB, denoted by SERdB,worst, is achieved at Nyquist sampling where the
entire error power is aliased into the baseband. As long as fsfSAW  1, doubling fs halves N1 and
thus decreases Ptotal approximately by a factor of 2. A 3 dB improvement in SERdB is expected.
When fsfSAW  1, Pal results from high-order fSAW bands, and is consequently negligible com-
33
pared to Pbb. In this case, equation (2.15) reduces to
Ptotal ≈ Pbb.
The error power becomes independent of the sampling frequency and SERdB reaches its highest
value, denoted by SERdB,best. At infinite oversampling, the result converges to the CT-quantization
error power. Ptotal does not go to zero, as one would have assumed by applying the “3 dB reduction
for each doubling of fs” rule.
As this analysis shows, the plot of SERdB versus
fs
fSAW
has two asymptotes. As shown in Fig.
2.9. For fsfSAW  1, the plot asymptotically approaches a straight line of slope 10 dB/decade (i.e.,
3 dB/octave), starting from the lowest point of SERdB,worst. For
fs
fSAW
 1, SERdB approaches
SERdB,best.
Fig. 2.9 also shows the results of Matlab experiments to verify the staircase model. One-tone
and two-tone inputs are used with a six-bit quantizer. The measured points satisfactorily reproduce
the asymptotic trends in the regions fsfSAW  1 and
fs
fSAW
 1, with, however, some local deviations
that can be attributed to three factors: The model assumes a flat spectrum within each fSAW band,
which is not exactly true. The assumption implied by the power additivity in (2.14) – i.e., that all
the aliased components are independent – is not rigorously satisfied. Finally, (2.14) assumes that
each band [n fs− fBW,n fs + fBW] entirely falls into one fSAW band of order k. Fig. 2.8 shows that







































































Simulated data with a two−tone input
(b) Two-Tone Test
Figure 2.9: Theoretical asymptotes and simulation results of SERdB using the staircase model.
Both cases use a six-bit midtread quantizer. (a) One-tone test with fBW = 20 fin. (b) Two-tone test
with fBW = 20 fin,avg.
2.4 CT-ADC with time coding
The event times tk in CT-ADCs are nonuniformly spaced and can have arbitrary values. Time
coding (a term adopted from [34]) is necessary when one wants to store the nonuniform samples
[20]. A high-frequency clock can be used to quantize the time intervals between any two successive
events ∆k = tk−tk−1. An interesting question is how high the clock frequency should be. To answer
this question, we note that the output of such a CT-ADC with time coding is equivalent to a clocked
ADC with the same sampling frequency; thus, the staircase model in Fig. 2.8 applies. According
to the previous result that the aliased error power becomes negligible when fsfSAW  1, the time
discretization needs to be sufficiently fine so that the in-band SER is comparable to the CT case
with no sampling. Thus, the sampling frequency has to be several times higher than fSAW to ensure
the highest in-band SER.
35





Figure 2.10: A CT signal processing system composed of a CT-ADC and a CT-DSP.
2.5 Error analysis of a CT-DSP
The error at the output of a CT-DSP is analyzed next. Fig. 2.10 shows a signal processing system
composed of a CT-ADC and a CT-DSP. Mathematically, the CT-ADC (the quantizer) can be repre-
sented as a summer with an additional input e(t), which is the quantization error signal. The output
of the quantizer xq(t) can then be expressed as
xq(t) = x(t)+ e(t).
Since the CT-DSP is a linear-time-invariant (LTI) system, both the input and the error are processed
by the same transfer function H( f ) of the DSP. The Fourier transform of the CT-DSP output, Y ( f ),
can be expressed as
Y ( f ) = X( f )H( f )+E( f )H( f ), (2.16)
where X( f ) and E( f ) are the Fourier transforms of x(t) and e(t), respectively. The error at the
output of a CT-DSP is the second term in the equation, whose power spectrum can be represented
as |E( f )|2|H( f )|2.
The CT-DSP consists of an FIR or an IIR filter with a uniform tap delay TD [20]. This results
36
in a periodic frequency response with a period of 1/TD. While the bandwidth of X( f ) is less than
half this period, |E( f )|2 remains dominant in the whole interval [0, fSAW], which typically covers
several 1/TD periods. Thus, the power spectrum |E( f )|2|H( f )|2 of the filtered quantization error
signal periodically yields regions of high values at least several times outside the input baseband.
Interfacing the output of the CT-DSP with standard (synchronous) data-processing blocks requires
uniform sampling synchronized with the clock of those blocks. To obtain an in-band SER compa-
rable with the nonsampling case, a sampling frequency at least higher than fSAW should be used to
prevent any significant out-of-band error power being aliased into the baseband.
Chapter 3
System-Level Considerations of CT Digital
IIR Filter Systems
3.1 Introduction
This chapter describes the system-level considerations of a CT digital IIR filter system design. An
illustrative example first explains the instability issue introduced by delay-line mismatches. Such
an issue has been neither observed nor analyzed by any previous work. This chapter provides a
practical solution to solve this issue. Then a new way of designing high-order CT digital IIR filters
is introduced. It allows one to use only two tap delays to implement CT IIR filters with any order
higher than two. Following that, this chapter also discusses an interpolator used in the system in
preparation for CT-to-DT conversion. The interpolator suppresses the noise and distortion power
37
38
in the repetitive passbands of the IIR filter’s frequency response by employing multiple sections of
CT digital FIR filters.
3.2 Preventing instability in CT digital IIR filters
Potential instability and its causes. In contrast to DT digital filters, which use a clock to realize
the tap delays, CT digital filters rely on CT delay lines to produce the required timing. In filters
with an order greater than one, multiple delay elements are required, and mismatches between
them are of concern. In a FIR filter, mismatches only cause errors in the frequency response. In
CT digital IIR filters, however, mismatches can cause instability, as will now be explained via an
example.
Consider as an example, the second-order IIR filter in Fig. 3.1(a), with a unit pulse at its input.
Assume that b1 = −1.5, b2 = 0, so that only the smaller loop is active. This system is unstable;
the upper-loop gain is greater than one, which causes the output to progressively become larger
and larger, without bound. However, when the lower tap coefficient is changed to b2 = −0.6, the
system becomes stable, as can be deduced from its unit pulse response in Fig. 3.1(b). The reason
is that the output from the two paths, which are aligned and spaced apart by TD, combine in such
a way that the output is prevented from growing. The stability of this system can also be verified
with conventional z-domain analysis, which shows that the system poles are inside the unit circle.
Unfortunately, when mismatches exist, the above alignments of the signals from the two taps,
which had “saved the show” above, cannot be expected to occur. This is illustrated using the system








0 TD 2TD 3TD 4TD











*Ideal impulse should have zero 
width. A non-zero width is used 

























0 TD 2TD 3TD 4TD











*Ideal impulse should have zero
width. A non-zero width is used 

















Figure 3.2: A second-order IIR filter with tap delay mismatch. (a) Block diagram. (b) Unit pulse
response for the same b1 and b2 values as in Fig. 3.1.
being returned from the two taps no longer coincide in time, as shown in Fig. 3.2(b) for a mismatch
of 10% (this value is exaggerated for clarity in the graph). The pulses no longer combine as a group,
which would have yielded the result in Fig. 3.1(b). Each pulse appears isolated and keeps growing.
This instability will be mathematically proved in Appendix A. It can be verified that if each pulse
cluster (shown around multiples of TD) is allowed to coincide by reducing the mismatch to 0, the
response of Fig. 3.1(b) results.
Solution. Since mismatches prevent the clusters from aligning with each other, we can force the
pulses to coincide by grouping them. This can be achieved by holding the pulses that appear at the



































Figure 3.3: A second-order IIR with event grouping. (a) Block diagram. (b) Timing diagram.
for the delay (tg) caused by this pulse holding and grouping by modifying the upper delay element
to TD− tg as shown in Fig. 3.3(a). Fig. 3.3(b) illustrates the operation. Assume that Pulse A in the
output passes through the top delay and appears at Tap1 after a delay TD− tg, as Pulse B. This
pulse does two things: 1) It goes through the bottom delay of length T ′D (of value to be discussed
below), causing Pulse C to appear at Tap2, and 2) it activates a grouping window D of length tg.
Now consider the output of the grouper/adder. Since, in this example, the only pulse present in
the first window is B, with no other pulse to be added to it, this pulse appears at the output of
the adder, after a delay tg, as Pulse E. The total delay caused by going from Pulse A, through the
loop, to Pulse E, is (TD− tg)+ tg = TD, as is marked between Pulses A and E. Similarly, Pulse E
is delayed and appears at Tap1 as Pulse F, activating a new window, G. Now there are two pulses
to be grouped and added together: F and C. The result appears at the end of the second window, as
H. It can be seen from the marked intervals that H follows E with a delay TD.
The value of T ′D is in principle not critical, as long as Pulse C falls within Window G, but both
its position and the width of the window can be optimized given information on mismatches. Thus,
41
if τ represents the expected worst-case delay tolerance, the window should be at least as wide as
tg = 2τ, and T ′D should be such that Pulse C nominally falls in the center of the window, which
implies that T ′D = TD + tg/2.
In addition to the pulses originating on Tap1 and Tap2 as in the above example, we may also
have an input pulse (not shown in Fig. 3.3), which is also allowed to activate a grouping window.
If no other pulse arrives during that window, the input pulse appears at the output after a delay
tg. If, however, a feedback pulse originating on Tap1 arrives during that window, the window is
extended by tg from such arrival, so that the above approach, which is essential for stability, is
not compromised. At the end of this extended window, all pulses, including the input pulse, are
grouped and added together. This can cause a small input-dependent variation of the grouping
delay; but as long as tg is much smaller than TD, the resulting distortion is small.
Since pulses arriving within a same grouping window are combined together, the length of
the window tg determines the minimum interval between pulses getting out of the grouper. It also
constrains the input-signal granularity, which should be no less than tg. Otherwise, two successive
input events can collide in one window. Throwing away either event results in a distortion. On
the other hand, the input sees a variable window length, which can be slightly longer than tg.
The longest window an input event can see is 2tg, which occurs when a feedback event extends a
grouping window right before the window closes. Thus, even if the input signal granularity is tg,
the collision can still happen. If the collision happens, the earlier input event is replaced by the
newer one to keep the data updated.












Figure 3.4: Input and output waveforms of a second-order IIR filter with mismatched delay lines.
b1 = −1.5, b2 = −0.6. TD1 = TD, TD2 = 1.1TD. The solid and dash lines at the output show the
filtered output with and without the delta modulation respectively.
maintain stability as expected in the presence of delay mismatches. If the input granularity require-
ment is satisfied, the input collision scenario happens very rarely and the resulting distortion is
small.
3.2.1 Discussion of the previous CT digital IIR filter design
To the best of the author’s knowledge, the only attempt at CT digital IIR filter design has been
in [18]. It implemented a second order CT digital IIR filter. Surprisingly, although it didn’t use the
grouping method proposed above, the resulting IIR filter was still stable. On the other hand, the
filtered output contained a substantial amount of noise. We explain the reasons in this section.
Fig. 3.4 shows typical input and output waveforms of a second order CT digital IIR filter with
mismatched delay lines. Its input is a quantized sinusoidal waveform with a uniform quantization
step, which is denoted as ∆. The filtered output waveform is shown as the solid line on the right.
Because of the misalignment of the responses at Tap1 and Tap2, the output waveform contains
spikes. Initially, these spikes are spaced by approximately TD. However, both the magnitudes and
43
the width of the spike clusters grow along the time, which ultimately messes up the entire output
waveform.
These spikes do not appear at the output of the IIR filter in [18], thanks to the use of a digital
delta modulator in the feedback paths, right after the adder. The design in [18] assumes that the
adder output does not change by more than 1 ∆ at any time (the reason of which can be found
in [18]). Hence, a delta modulator can be used to encode the adder output in only two bits: the
timing bit and UP/DN. The digital delta modulator truncates the output of the adder so that the
magnitude of the least significant bit (LSB) equal to 1 ∆. The modulator monitors the LSB and
only generates a feedback event when the value of LSB changes. The resulting output waveform is
shown as the dashed line on the right of Fig. 3.4. Because the output waveform can only go up or
down by 1 ∆, the growing of the spikes are hence limited. This explains why the output waveforms
do not increase without a bound in [18].
On the other hand, the original filtered output shown by the solid line in Fig. 3.4 has shown
that the unmodulated output waveform can have a magnitude change more than 1 ∆. Hence, the
assumption in [18] that the output waveform can only change by no more than 1 ∆ is not correct.
Whenever the spikes occur, the digital delta modulator cannot capture the full magnitude change
of the output waveform. The modulator output only changes by up to 1 ∆. More seriously, the
change directions of modulator’s output may be opposite to the change directions of the spikes.
An example of such case is shown near the first spike in Fig. 3.4. This is because that the delta
modulator generates output based on partial information. The correct output should be the one with
44
the spikes; thus, compared to that, the delta modulator’s output contains a large amount of error.
This explains the high noise power in the measured output in [18].
3.3 Digital signal processor with signal-derived timing
In previous CT digital filters [3,15,17], the number of tap delays is the same as the filter order. We
introduce a new design method that allows one to implement high-order CT digital IIR filter with
only two tap delays. As shown in Fig. 3.5(a), a sixth-order CT digital IIR filter can, in principle, be
implemented with cascaded second-order sections in direct-II form. Signals at the internal nodes
of a CT-DSP are also CT digital. An event at these internal nodes means an update of their data
values. The completion of the update is indicated by a pulse on its timing bit req (see Fig. 1.4(b)).
For example, in Fig. 3.5(a), when an input event arrives, it triggers an update at node p to reflect
the data change at the input. Assume first delay-free arithmetic and perfectly matched delays. The
new event at p also triggers new events at u and v simultaneously. Events at node p, u and v
share the same timing. Since the delays in all taps are identical, events at node p1, u1 and v1 also
share the same timing. So do events at p2, u2 and v2. Besides the input, the IIR filter only has
three distinguishable timing signals. They are all present in the shadowed part in the first section
of the IIR filter. Its timing path can then be separated out and shared by the rest of the system,
which results in an equivalent implementation in Fig. 3.5(b). The timing block is composed of two
tap delays in feedback to generate the timing needed to realize the infinite impulse response. The
signal-derived reqin further derives three more timing signals: reqW , reqR1 and reqR2. They trigger









































































Figure 3.5: A sixth-order CT IIR implemented with cascaded second-order sections. (a) Block
diagram. (b) Implementation with shared timing signals derived from input.
two independent reads (R1 and R2) each. This method of using only four timing signals to control
the entire system can be applied to CT digital IIR filters of any order higher than two.
The TD (2TD) delay in the data path is realized by writing data into asynchronous FIFOs and
reading it out TD (2TD) later. Since there is an one-to-one correspondence between a pulse at the
timing signals and a data stored in the FIFOs, it is extremely important not to lose any pulses
during the propagation of the timing signals in the system. This is ensured by using four-phase










































Figure 3.6: A sixth-order CT IIR with an event detector.
3.3.1 Event detection
Because the timing block in Fig. 3.5(b) contains close-loop paths, once an event enters, it keeps
going around the loops without a stop. If there is no change at the outputs of the arithmetic blocks,
these activities generate redundant events that waste energy. To remedy this, an event detector is
introduced to monitor the signals in the feedback loops. Shown in Fig. 3.6, the event detector sits
across the timing block and the data path. It monitors the data which will be stored into the FIFOs
(i.e., data1, data2, and data3) and controls the timing bit reqW . When a pulse occurs on reqW , it
tries to write data, which are associated with this event, into the three FIFOs. If the values of data1,
data2, and data3 are all the same as the previous event, the current event is identified as redundant.
The event detector prevents the pulse entering the delay lines in the timing block. It also prevents
the pulse triggering write operations in the FIFOs. This event is effectively eliminated from the
system. As long as any one of the three data is different from the previous event, the current event
is not redundant.
47
3.4 Interpolation filter for synchronization
Although the operations of a CT digital IIR filter track the input activity, the varying event inter-
vals of the CT digital signal at its output prevent its integration with DT systems. Such integration
becomes possible if a synchronous output is generated by sampling the CT signal with a clock.
Because of the use of tap delays, the frequency response of the CT digital IIR filter is periodic.
The repetitive passbands preserve distortion and noise power. It stays out of band if the CT dig-
ital signal is the final output as in [15]. Once sampled, it is aliased back and degrades SNDR. A
computationally efficient interpolator employing FIR filters is used before sampling to alleviate
the issue [36, 37] (Fig. 3.7). It is composed of four first-order FIR sections. The filter implements
notches at (k+16n)/TD, k = 2, . . . ,15, n = 1,2, . . . (Fig. 3.7(b)). These notches suppress the distor-
tion and noise power, and push away the first intact repetitive passband to 16/TD. Configuring the

































 S1   S2   S3   S4

























































*Only req is shown in the figure for clarity.
(a)
(b)
Figure 3.7: Interpolation filter. (a) Block diagram. (b) Frequency responses of each section.
Chapter 4
Design of Tunable Digital Delay Cells
4.1 Introduction
As we have introduced in Chapter 1, tunable digital delay cell is the most important building block
in CT-DSP systems. It is often expected to have the following attributes:
• Wide tuning range;
• Good matching between identically laid out cells;
• Low jitter;
• Signal-independent delay;
• Robust communication, in order to guarantee correct propagation of every input event.
The existing designs [38–40] have shown satisfactory results in terms of tuning range. However,
none of these work tried to optimize delay line matching. A poor delay line matching only results in
49
50
an inaccurate frequency response in CT digital FIR filters [3,15,17]. However, it can be disastrous
in CT digital IIR filters, as we explain in Section 3.2. This chapter presents a widely tunable delay
cell with good matching properties. It also has low jitter and a robust communication channel to
adjacent circuits. Previously unreported effects that result in signal-dependent delay are discussed
and eliminated.
This chapter is based on [41].
4.2 Prior art
We begin with a discussion of prior art, both because we use some of its features, and in order to
point out its limitations, which are addressed in this work.
We focus on delaying digital information. Pure analog delay cells are possible [42], but the re-
quired constant bias current makes them energy inefficient for our purposes. Delay cells composed
of cascaded inverters are not easily tunable. Delays based on thyristor-like circuits using a current
source to charge a capacitor consume low power and have good tuning range [38–40], and will be
the focus of this paper.
In CT-DSP applications, a delay line is used to implement a long delay with a fine granularity,
and hence is composed of a chain of smaller-delay cells. This is shown in Fig. 4.1(a). Each cell
passes an event (here defined as an up-going digital pulse edge) to the next stage through an asyn-
chronous four-phase handshaking to avoid glitches [35]. Initially, all req signals are low and all ack
signals are high. When the (k−1)th cell finishes delaying an event, it passes the event to the kth cell





















































delay reset reset 
VDD
V1
Figure 4.1: Delay cells. (a) A delay line composed of several delay cells for fine time granularity;
(b) schematic of one delay cell; (c) timing diagram.
52
down to acknowledge this to the (k−1)th cell. That cell then quickly resets itself and pulls down
reqi of the kth cell. Following that, acki of the kth cell is pulled up to conclude the event transfer
between the two stages. After the kth cell finishes delaying the event, the same handshaking takes
place between the kth and (k+1)th delay cells.
The schematic of one delay cell is shown in Fig. 4.1(b) [38–40]. A rising edge on reqi from
the previous cell sets the latch and moves the cell into the delay phase; acki is pulled down to
acknowledge the previous cell. The voltage on node A jumps to V1 initially, due to capacitive
feedthrough when M1 is turned off. The capacitor C1 is then slowly charged through M2–3. M4–
8 form a transistor-based thyristor [38] to monitor VA. Once VA falls below the threshold of the
thyristor, M4 is turned on to quickly pull VA to GND. The thyristor provides a sharp rising edge
on reqo. When the next delay cell enters the delay phase, it pulls acko down to reset the SR latch.
The delay cell shown enters the reset phase. M3 is turned off and M1 is turned on to discharge C1.
When VA goes back to VDD, the cell is fully reset, where it remains in anticipation of the next input
event.
The time difference between the rising edges of reqi and reqo is the cell delay, denoted by tg.
It is equal to
tg =C(V1−VT H)/I (4.1)
where C is the capacitance of C1 and I is the DC current of M2. The delay can be tuned through
I by changing the gate voltage of M2, Vtune. The energy consumed by each delay is dominated by
the charging and discharging of the capacitor, and equals CV 2DD.
Although the design described can be widely programmable, it suffers from three issues. First,
53
energy is wasted since, after VA crosses VT H , the circuit keeps charging C1 unnecessarily. Second,
the implementation of the input asynchronous channel has a potential hazard: when an event arrives
at the input of the current delay cell, its acki is pulled down to set its SR latch and to reset the
previous delay cell. If the reset delay of the previous cell happens to be longer than the delay of
the SR latch, acki is pulled up to stop the reset operation even before that operation is completed.
This error is fatal, because the previous delay cell is not able to accept new events anymore. Third,
delay variations due to device mismatches is not considered and minimized in this circuit. In this
work, we introduce a new delay cell which resolves all the above issues.
4.3 Proposed delay cell
4.3.1 Eliminating energy waste
The first version of the proposed delay cell is shown in Fig. 4.2(a) (improvements to it will be
discussed later). At the output, three inverters are used to replace the thyristor in Fig. 4.1. Fig.
4.2(b) shows the timing diagram. At the end of first delay phase, when VA crosses the threshold of
the first inverter (M5–6), reqo is pulled up. Once the next stage enters the delay phase, it pulls acko
down. This resets the SR latch shown, and turns M3 off and M1 on. As a result, VA is discharged to
VDD from the vicinity of VT H , rather than from GND as in Fig. 4.1; energy is saved. The second half
of the timing diagram will be explained later. VT H is determined by the strengths of M5 and M6,
while the following two inverters (M7–10) provide gain and correct logic polarity. As VA slowly
54
moves near VT H , it causes a crow-bar current through M5 and M6. Sizing M6 to be weak limits
the current and hence alleviates this issue.
4.3.2 Ensuring robustness
At the input of the delay cell, a C-element [43] is introduced to improve robustness (Fig. 4.2(a)).
This is a memory circuit that outputs a high when both inputs are high; outputs a low when both
inputs are low; and, if the two inputs are different, holds the previous output value. Initially, the
delay cell is in reset phase with Q low and Qb high. The C-element’s output is low. Once a rising
edge occurs on reqi, the C-element’s output is pulled up and acki is pulled down. This sets the SR
latch and resets the previous stage. When the operation in the latch completes, Qb is pulled down;
when the previous stage finishes resetting, reqi is pulled down. Thanks to the use of the C-element,
the order of these two events no longer matters. The C-element goes back to low only when both
events occur. This assures that acki goes back to high only after the reset operation in the previous
stage is completed. The design is insensitive to gate delay variations, as long as the following two
timing assumptions are satisfied: 1. The falling delay of Qb in the SR latch, plus the falling delay
of the C-element should be shorter than the rising delay of Q plus the delay of node A decreasing
from VDD to VT H , plus a handshaking delay at reqo/acko interface. This assures that once a delay
cell enters the delay phase, the C-element’s output can go to zero before the delay cell starts the
next reset phase. 2. The rising delay of the NAND gate should be shorter than the rising delay











































Figure 4.2: Baseline delay cell design. (a) Schematic; (b) timing diagram.
56
terminals of the SR latch are never on simultaneously. Fortunately, both timing assumptions can
be very easily satisfied.
4.3.3 Sizing the current source for good matching
Device M5 and M6 cannot be made very large, as this affects the parasitic capacitance of node A.
With reasonable small values for them (see Table 4.1 below), simulations show that the variation
of the delay around its nominal value is dominated by the current source (M2) and the capacitor
(C1) in Fig. 4.2. The capacitor size needs to be adequate for good matching and low jitter but not
larger, as this would necessitate a large current for charging it and would waste energy. A pMOS
capacitor working in the depletion region, with a capacitance of 13 fF, was found adequate based
on these considerations and assuming a nominal delay of 25 ns.
A delay cell was designed in a 65 nm LP CMOS process to study the relations between delay
variation and the size and bias of the current source (M2 in Fig. 4.2). Fig. 4.3(a) shows the ratio of
standard deviation of the delay to its mean value as a function of the transistors gate area AG =WL.
One hundred Monte Carlo simulations were run, with fixed Vtune and W/L so that the transistor is
at the same inversion level when its gate area is changed; the behavior seen is as expected. Fig.
4.3(b) shows the same ratio as a function of W/L, keeping the gate area fixed. Its bias current is
provided through a current mirror fed with a constant current. The mismatch is seen to worsen as
W/L increases, because this cause the transistor to move from strong to weak inversion, where
matching for constant current is known to be poor. It is clear from Fig. 4.3 that a small delay
57









Gate area (AG) (μm
2
)




















Figure 4.3: Delay standard deviation over mean. (a) As a function of area (b) As a function of
W/L.
Device M2 M5 M6 M7 M8 M9 M10 
W(nm) 700 800 120 120 120 120 120 
L(nm) 10,000 200 400 60 60 60 60 
 
Table 4.1: Devices sizes of the delay cell in Fig. 4.2.
variation requires large AG and small W/L. Since AG = WL, or L2 = AG/(W/L), a large L is
needed.
The same choice benefits the jitter performance. A large AG keeps down flicker noise, while a
transistor in strong inversion has a lower white current noise than in weak inversion, for a given
current.
Based on the above guidelines, we choose M2 to have a length of 10 µm and a width of 0.7 µm.
With Vtune = 630mV, and a capacitor of 13 fF, this achieves a nominal delay of 24.9 ns and σ/µ =
1.2%. Some of the final devices sizes are summarized in Table 4.1. The sizes of the switches M1,
M3 and M4 are 200 nm (W )/60 nm (L).
58
4.3.4 Identifying, and eliminating, signal-dependent delay
Unfortunately, the resulting design has signal-dependent delay, as illustrated in Fig. 4.2(b). When
the first event enters the delay cell while the cell is fully reset, VA first jumps up to V1 due to charge
injection from M1 and then decreases to VT H . The cell implements a nominal delay tg. At the end
of the first delay phase, acko is pulled down to reset the delay cell. However, when the second event
enters the delay cell, VA jumps to V2, which is lower than V1 for reasons explained below. It now
takes VA a shorter time to reach VT H , which results in a t ′g shorter than the nominal value tg. The
difference between V1 and V2 is due to the use of the very long transistor (M2) as explained with
the aid of Fig. 4.4, which shows the charge distributions when the transistor is fully on (left case)
and fully off (right case). In both cases, the gate, drain and body of the transistor are tied to Vtune,
VDD and GND respectively. When the delay cell is in the delay phase, the transistor is on with an
inversion layer present (left in the figure). When the cell enters the reset phase, the source terminal
is pulled to VDD in order to eliminate the inversion layer; channel electrons exit the device through
both the drain and the source terminals. Because the transistor is very long, the electrons have to
travel a long distance to reach either terminal, which takes time. There exists a latency between
the change of the bias voltages and the change of the charge distribution in the transistor. This
phenomenon is a nonquasistatic effect, and its latency is inversely proportional to the square of the
channel length [44]. If the delay cell reenters a new delay phase before the procedure completes,
some electrons which have not escaped remain in the channel. At the beginning of the second
delay phase (second part of Fig. 4.2(b)), positive charges are injected into node A due to capacitive












Transistor is turning on














Figure 4.4: Charge distributions of nMOS transistor current source (M2 in Fig. 4.3) when it is
fully on (left) and fully off (right).
in the channel, and thus the overshoot V2 is smaller than V1. The exact value of V2 depends on
the amount of left-over electrons in the channel, which depends on the spacing between two input
events. Hence the delay becomes signal-dependent.
A similar effect exists in the reverse direction, when the transistor is turned from off to on. The
latency in this direction becomes part of the delay implemented by the delay cell and hence is not
harmful.
The nonquasistatic effects just described can be alleviated by using a shorter length for M2,
but this compromises matching. A solution is shown in Fig. 4.5. The original transistor M2 in
Fig. 4.2 is split into five transistors with equal lengths (M21-M25). When the delay cell enters the
reset phase, the five transistors are turned off in parallel so that the electrons in the channels travel
a distance which is five times shorter. When they are turned on, the five devices are connected in























Figure 4.5: Schematic of the modified delay cell.
design, we choose the five transistor to have a width of 0.5 µm and a length of 1 µm. The aggregated
length of the transistors is smaller than that of M2 in Fig. 4.2, to make sure each transistor does not
suffer from nonquasistatic effects; for this reason, W has also been reduced. With Vtune = 600mV,
the delay cell achieves a nominal delay of 25.1 ns and σ/µ = 2.3%.
4.4 Measurement results of delay cells
Fig. 4.6 shows the photo of the die, with the area containing 128 delay cells of the type shown in
Fig. 4.5. The layout of one cell is shown on the right. The chip is fabricated in a 65 nm LP CMOS
process. The delay of one delay cell can be tuned by adjusting Vtune. As shown in Fig. 4.7(a), a
wide tunability range, from 5 ns to 10 µs, is achieved. Automatic tuning of a delay cell using an




Layout of one delay cell
Figure 4.6: Die photo (left) and layout of one delay cell (right).
relation between the delay and the input event spacing. We feed two successive events into one
delay cell when it is completely reset, and measure the delay of the second event. The curve with
a large variation is for the design in Fig. 4.2, and the flat curve for the design in Fig. 4.5. It is clear
that the solution described in Section 4.3.4 solves the problem of signal-dependent delay.
A comparison of this work to delay cells in prior art [38–40, 45] is summarized in Table 4.2;
the results shown are measured unless indicated otherwise. Table 4.3 compares jitter performance
with multiple delay cells cascaded. The design achieves low jitter and good matching, and provides
a robust communication interface. The price to pay for these improvements are a larger area and a
higher power consumption. These penalties can be alleviated by simplifying the delay cell design
in applications where 4-phase handshaking is not necessary.
62






 Delay cell of Fig. 2
 Delay cell of Fig. 5

































Figure 4.7: Measured delay-cell responses. (a) One cell (Fig. 4.5) versus Vtune. (b) One cell versus
input event spacing for the cells of Fig. 4.2 and Fig. 4.5.
Parameter Vezyrtzis Chang Schell Kurchuk * This work 
VDD 1 V 2 V 1 V 1.2 V 1.2 V 




0.3-300 ns 5 ns-10 μs 
Matching - - - 12.4% 2.3%*† 
Energy/delay 50 fJ 60 fJ 50 fJ 20 fJ 83 fJ‡ 
Area - - 36 μm
2

















†Standard deviation over mean value; 100 Monte Carlo simulations; 
Vtune=625 mV.
‡Simulation; measured chip power consistent with simulations.
Table 4.2: Comparison of different delay cells.
63
 [7] [10] * This work 
Number of delay cells 10 4 20 
Jitter (rms/mean) 0.33% 0.3% 0.065% 
 *Simulation results.
Table 4.3: Jitter of delay cell cascades.
4.5 Conclusion
Design considerations for digital delay cells with good matching, low jitter, and robust communica-
tion interfaces have been presented. A design of a widely tunable delay cell following the guidance
presented has been described, achieving a 5X smaller jitter than prior art, and a delay mismatch of
2.3%. Nonquasistatic effects on signal-dependent delay have been addressed and prevented in one
of the designs presented.
Chapter 5
Implementation of a CT Digital IIR Filter
System
5.1 Overall system architecture
Fig. 5.1 shows the block diagram of the complete system. The CT digital IIR filter implements the
desired transfer function. An interpolation filter is used in preparation for CT-to-DT conversion.
It is composed of CT digital FIR filters which suppress the noise and distortion power in the
repetitive passbands of the IIR filter’s frequency response. The CT-to-DT converter converts the
filtered signal into a synchronous digital output compatible with conventional DT systems.
A four-phase bundled data protocol [35] is used for communications between and within blocks
to assure robust communications between stages. Rather than using a pulse, a rising edge on req



































Figure 5.1: Top-level block diagram of the CT digital IIR filter system.
accepted by the following block. For example, if the req between the IIR filter and the interpola-
tion filter is pulled high, which means an new event occurs at that node. A request is sent to the
interpolation filter. Once the interpolation filter receives the event, it acknowledges the IIR filter
by pulling ack down. Now that the event has safely propagated to the following stage, the request
can be released: req is pulled down and ack is pulled high. This rising edge on ack concludes the
four-phase handshake, which allows the next event to occur at this node. Although asynchronous
digital techniques are used, a CT digital filter is different from an asynchronous digital filter. The
timing distances between events, which contain critical information in CT digital signals, are pre-
served during their propagation in the former. An asynchronous digital filter is not capable of doing
so. Except for the seven-bit input at datain, a 16-bit word length is used everywhere to minimize












































































































































S1: holding only an input event 
S2: holding an event from reqR1
S3: holding a pair of events from reqR1 and reqR2
Label explanation:
Triggering event / output transition
























Figure 5.2: The input-derived timing block. (a) Architecture. (b) State diagram of the grouping
block.
5.2 Architecture of the CT digital IIR filter
Section 3.3 introduces a method which allows one to design a high-order CT digital IIR filter with
only two tap delays. It separates the system into a shared timing block and a data path. This chapter
discusses the practical implementations of each part in detail.
5.2.1 Timing block
Fig. 5.2(a) shows the architecture of the timing block with mismatch of delay lines considered. The
grouping block and an extra half-delay cell in the second tap delay implement the grouping solu-
tion introduced in Section 3.2. The length of the grouping window determines that the minimum
interval of events going into the feedback paths is tg. To handle this traffic, the two tap delays are
composed of a cascade of smaller-delay cells with a delay of tg. The grouping block takes reqin,
67
reqR1 and reqR2 as its inputs and generates a four-bit output, reqgrp, each bit of which triggers
different FIFO operations in the data path.
Fig. 5.2(b) illustrates the grouping algorithm as a state diagram. Initially the block is in the idle
state, S0. When an input event arrives, a rising edge on reqin (denoted by reqin+ in Fig. 5.2(b))1
moves the block into S1, meaning that it holds and only holds an input event. Meanwhile, a group-
ing window is activated. If no other events arrive during it, the window closes (denoted by win-
) after tg and the block moves back to S0. In the meantime, reqgrp.in is pulled up (denoted by
reqgrp.in+). If a feedback event arrives at reqR1 during the window, it pulls up reqR1 (denoted by
reqR1+) and moves the timing block into S2, meaning that it holds a feedback event from reqR1.
The transition from S1 to S2 extends the grouping window by tg from the arrival of the feedback
event. If no feedback event arrives at reqR2 during the window, the block moves back to S0 once
the window closes and pulls reqgrp.R1 up (denoted by reqgrp.R1+). Otherwise, the event arriving
at reqR2 pulls up reqR2 (denoted by reqR2+) and moves the block from S2 to S3. The new state
means that the block holds a pair of feedback events from reqR1 and reqR2. The transition from S2
to S3 does not affect the window. Upon the close of the window, reqgrp.R1R2 is pulled up (denoted
by reqgrp.R1R2+) and the block moves back to S0. It is possible that an input event arrives when
the grouping block is in S2 or S3. If so, that event is held along with the feedback events without
affecting the grouping window or the block state.
The grouping solution introduced in Section 3.2 requests the event at reqR2 must arrive during
S2, within the window triggered or extended by the previous event at reqR1. This is ensured by
1To keep the discussion simple, we only show the rising transitions of req signals. In a practical implementation,
a four-phase handshaking is completed soon after each transition, which finally pulls the req signals back to zero so
that the block can be ready to accept another event.
68
designing the nominal value of the second tap delay to be tg/2 longer than the first tap delay,
and their mismatches to be smaller than tg/2. On the other hand, we retain the grouper’s capacity
to handle reqR2 events arriving at another time. This is reserved for the event detection function
introduced in Section 3.3.1. When an event arrives at reqR2 during S0 or S1, it immediately pulls
reqgrp.R2 up (denoted by reqgrp.R2+) without affecting the block state or any ongoing grouping
window.
5.2.2 Data path
The architecture of the data path with nonzero arithmetic and FIFO delays is shown in Fig. 5.3.
Because the adders (except the first one) use the result of their previous adder as an input, the
data path is implemented in an asynchronous pipeline to maintain a high throughput. Four adders
of the sixth-order IIR filters are spread over four stages (DFF4 to DFF7) so that they can operate
concurrently. The multipliers preceding each adder are placed into the same stage where the adder
sits to not increase the number of stages in the pipeline. A small number of stages requires fewer
control signals and thus simplifies the design.
The pipeline starts when an event arrives at reqin, reqR1 or reqR2, activating the grouping win-
dow in the timing block and holding the data in DFF1, DFF2 or DFF3, respectively. The read
operations in FIFO1 always start tg before the feedback event actually arrives at reqR1 or reqR2.
This allows a tg timing margin for read operations in FIFO1. At the end of the window, one of
the bits in reqgrp is pulled up, moving any held data in DFF1–3 into DFF4 and triggering new

















































































































































S1: holding only an input event 
S2: holding an event from reqR1
S3: holding a pair of events from reqR1 and reqR2
Label explanation:
Triggering event / output transition
























Figure 5.3: Architecture of the data path in a pipeline.
read out from FIFO2 to prepare for the arithmetic operations in next stage. If reqgrp.in is high, the
grouping window holds only an input event and hence no data is read from FIFO2. If reqgrp.R1 is
high, a feedback event from reqR1 is held in the grouper. So a data is read out from R1 of FIFO2. If
reqgrp.R1R2 is high, data is read out from both R1 and R2 of FIFO2. After tdelay, reqgrp propagates
to DFF5, latching the result from the previous stage into DFF5. Meanwhile, FIFO3 repeats the
operation that just occurred in FIFO2. After FIFO3, no further FIFO operations are needed and the
four bits in reqgrp can be safely merged into one with a combinational logic. New data is generated
at dataout after 5tdelay. Three intermediate results of the feedback paths are stored in the FIFOs at
the same time, upon the assertion of reqout. The value of tdelay should be longer than the interstage
propagation delay in the pipeline.
Notice that reqgrp goes through a tg-based delay line in the timing block (Fig. 5.2), and a
tdelay-based delay line in the data path (Fig. 5.3). These two delay lines have different design
70
requirements. Because tg determines the length of a grouping window, according to our conclusions
in Section 3.2, tg needs to be at least twice the worst-case mismatch between two delay blocks.
The lower boundary of tdelay, on the other hand, is constrained by the propagation delays in the
arithmetic operations. The minimum value of tdelay decreases with advances in CMOS process,
while the choice of tg cannot benefit from them. However, if tdelay is chosen to be the same as
tg, we can combine the two delay lines and save hardware. The nominal value of tg is 25 ns in
this design. We fabricated the prototype in TSMC 65 ns LP CMOS process. 25 ns is much longer
than the propagation delay of any stage in the data path. Hence, choosing tdelay to be tg gives the
data path enough timing margin for operations. In addition, combining the two delay lines also
allows the use of the event detector introduced in Section 3.3.1, which will be discussed in the next
section. Fig. 5.4 shows the implementation with the two delay lines combined.
5.2.3 Event detector
In Section 3.3.1, we introduced the event detector to monitor the feedback signals in the IIR filter
and to eliminate redundant events. It is inserted into the feedback path, right before the FIFOs into
which feedback data are written (Fig. 5.4). Its implementation is shown in Fig. 5.5. When an event
enters the event detector, it triggers the left delay cells. The data on in1, in2 and in3 are saved in
DFF9 and compared with the data of the previous event stored in DFF10. The comparison delay
should be shorter than tg/2 so that the result can be correctly latched. After a tg delay, reqEDI
is pulled up. If any of the three data values is different from the previous ones, CMP is low and





























































































































































Figure 5.4: Architecture of the sixth-order CT digital IIR filter with shared delay lines.
are the same, the event is identified as redundant. CMP is set high to prevent the event from entering
the right stage. The redundant event is effectively eliminated from the filter. A self-acknowledge is
conducted in this case to keep the asynchronous pipeline working.
By default, the comparators in the event detector take all 16 bits of the inputs for comparison.
In many low-resolution applications, it is not necessary to maintain a 16-bit resolution. The com-
parator is designed such that its resolution can be programmed from nine to 16 bits, with eight
different options. When a low-resolution is used, a data is more easily considered as redundant and
the IIR filter is more power efficient. The price is a high quantization noise power in the signal.













































































Figure 5.5: Event detector. (a) Architecture. (b) Timing diagram.
always arrive within the window activated by the previous event from Tap1. Notice from Fig.
5.4 that the event detector is only inserted in the first tap delay. It is possible that after an event is
eliminated from the first tap delay, its paired event in the second tap delay may arrive at the grouper
outside of any window, which violates the requirement. The solution to this is to throw away this
left-alone event. When the left-alone event from Tap2 arrives at the grouping block, reqgrp.R2 in
Fig. 5.4 is pulled up immediately, without triggering a grouping window. reqgrp.R2 and its delay
version are only responsible for reading data associated with the event from R2 of FIFO2 and
73
FIFO3. The timing signals do not trigger DFF4 and DFF5. Thus, the data does not participate in
any arithmetic operations.
5.2.4 Delay cell
The delay cell of the CT digital IIR filter system uses the design in Fig. 4.5 in Chapter 4.
5.2.5 Half-delay cell
The implementation of a half-delay cell is exactly the same as that of the delay cell in Fig. 4.5,
except M21-25 are twice as wide. When the capacitor is charging, twice as much current goes
through the capacitor, which results in a delay of approximate tg/2.
5.2.6 Grouping block in the CT digital IIR filter
Fig. 5.6 shows the schematic of the grouping block in the IIR filter. It is composed of three chan-
nels, with interconnections between them. The feedforward and R1 channels are essentially two
delay cells. The two current source transistors, M4 and M8, are each split into five short transistors
in series to avoid nonquasistatic effect. However, they are drawn as two single transistors to keep
the figure simple. The state of the block is stored in three registers: SR1, SR2 and DFF1. In S0, Q
of all latches and DFFs are low; the outputs of all C-elements are low; all req signals are low and
ack signals are high. When an input event arrives, it triggers a delay operation in the feedforward
channel by pulling up Q1. The block enters S1. tg later, Input A of the arbiter [47] is pulled up.

































































           S1: Q1Q2Q3=100 
           S2: Q1Q2Q3=d10 (d=do not care)
           S3: Q1Q2Q3=d11
Figure 5.6: Circuit implementation of the grouping block in IIR filter.
After the following stage accepts the event, ackgrp.in is pulled down to reset the feedforward chan-
nel. The block moves back to S0. If an event arrives at reqR1 during S1, the grouper enters S2. Q2
is pulled up to prevent reqgrp.in being pulled up through the arbiter. Meanwhile, it triggers a delay
operation in the R1 channel. tg later, Inv2 is pulled up. While Q3 stays low, reqgrp.R1 is pulled up.
However, if an event arrives at reqR2 during S2, the block enters S3 and Q3 is pulled up. Instead of
reqgrp.R1, reqgrp.R1R2 is pulled up upon the completion of the delay operation in the R1 channel. In
either case, an acknowledge signal will be received soon to reset the block back to S0. If an event
arrives at reqR2 while Q2 is low (i.e., Q2 is high), DFF2 outputs a high which pulls reqgrp.R2 up
75
immediately. Soon after that, ackgrp.R2 is pulled low to reset DFF2. The transition of DFF2 from
reset to set to reset again completes very quickly without affecting the state of the grouper.
5.2.7 Asynchronous FIFO
The architecture of the asynchronous FIFO with one write (W) and two read (R1 and R2) channels
is shown in Fig. 5.7. It is composed of 128 (4 columns by 32 rows) 16-bit-word cells. The SRAM
cell is based on the fully static design in [48], but contains two read channels. Fig. 5.8 shows the
timing diagram of both write and read operations. They are triggered by the timing signals reqW ,
reqR1 and reqR2, which are not synchronized to a clock. A write operation starts when reqW is
pulled up. A bundled data encoding scheme [35] ensures that dataW is ready slightly before the
timing signal. Wen is pulled up to latch dataW into DFF1 and starts writing the cells pointed by
the address addrW . After tg/2, wen is pulled down and addrW increases by one to prepare for the
next write operation. The write delay should be shorter than tg/2 so it can be completed before
the addrW changes. A read operation starts when either of reqR1 or reqR2 is pulled up. Fig. 5.8(b)
illustrates an example in which reqR1 is pulled up. Ren1 is then pulled up to read the SRAM cells
pointed by the address addrR1. After a read delay shorter than tg/2, data is ready on busR1. tg/2
after reqR1 is pulled up, ren1 is pulled down and latches the new data into DFF3. Meanwhile,
addrR1 increases by one in order to prepare for next read operation. The three address information
addrW , addrR1 and addrR2 are internal signals in the three address decoders. They increase by
one at the falling edges of wen, ren1 and ren2, respectively. The three addresses are decoded
into 32-bit row-enable and four-bit column-enable signals to select the desired cells. Because the
76
SRAM array








































































































































½tg ½tg ½tg ½tg
(b)(a)
Figure 5.7: Architecture of asynchronous FIFO with one write and two read channels.
three decoders operate independently, the asynchronous FIFO supports one write and two reads
operations simultaneously.
SRAM array








































































































































½tg ½tg ½tg ½tg
(b)(a)
Figure 5.8: Timing diagrams of write and read operations for the asynchronous FIFO.
77
5.2.8 Arithmetic blocks
The CT digital IIR filter’s multipliers and adders are synthesized using standard digital design
tools: Verilog, Synopsys® Design Compiler and Cadence® Encounter. This is the first time that
synthesize tools are used to implement the arithmetic blocks of a CT digital filter, which saves
significant time and design effort. The only design constraint is that the propagation delay of each
stage in the pipeline should be smaller than tg. In this design, the nominal value of tg is 25 ns. It is
relatively easy to meet this requirement in the 65 nm TSMC LP CMOS process.
5.3 Interpolation filter
Introduced in Section 3.4, the interpolation filter is composed of multiple first-order CT digital
FIR filters. The architecture of a first-order FIR section is shown in Fig. 5.9. To keep the design
modular, its delay line is also composed of a cascade of tg delay cells. It requires that the minimum
interval of events both coming from its previous stage and going to its next stage be no smaller than
tg. On the other hand, the adder may see two events arriving at its two inputs arbitrarily closely. To
meet the granularity requirement at the output of the adder, we use a grouping block similar to, yet
simpler than, the one in Section 3.2 to combine these close events. When an event arrives at either
input of the grouping block, it triggers a grouping window with a length of tg. Other events arriving
within the window are grouped with the leading event and moved to the next stage together at the
end of the window.
































 S1   S2   S3   S4

























































*Only req is shown in the figure for clarity.
(a)
(b)
Figure 5.9: Architecture of one FIR section in interpolation filter.
delay line and stores datain into the FIFO. It also activates a window in the FIR grouping block,
and holds datain in DFF1. At the end of the window, data in DFF1 and DFF2 are moved into the
next stage for arithmetic operations. The rising edge on reqout identifies the validity of dataout. It
also moves the output into the next FIR section. Similar grouping and arithmetic operations are
triggered once an event arrives at the end of the delay line. In a first-order filter, only one FIFO
read channel is required.
5.3.1 Grouping block in the interpolation filter
The grouping block in the interpolation filter is simply a delay cell with a slightly different input
interface. Shown in Fig. 5.10, it has two req/ack input communication channels. Whenever an event
from either channel arrives, it sets the SR latch and triggers a delay operation. The corresponding
ack is then pulled down to reset its previous stage. tg later, delay operation completes and reqo
is pulled up. The time difference between the rising edges of the input req signal and reqo is the






















Figure 5.10: Circuit implementation of the grouping block in interpolation filter.
edged by pulling the other ack down. At the end of the window, both events are moved to the next
stage.
5.4 CT-to-DT converter
Directly sampling the CT digital signal with a clocked DFF converts it into a synchronous digital
signal. Traditionally, two DFFs are used in series to minimize the possibility of the final output
being in a metastable state [49]. Although the synchronous output is ensured to be in a stable state
with a good confidence, it is not necessarily in a correct state. If any bit of a data settles to a wrong
state, we call the amplitude difference between the wrong sample and the corresponding correct
sample a sampling error. This introduces error power at the synchronous output.
We can rely on a characteristic of the CT digital signal to alleviate this issue. Notice that a CT
digital signal has a high data rate. The minimum space between two events is tg. On the other hand,
the tap delay of an IIR filter defines the baseband as [0,1/(2TD)]. So the maximum input frequency





























Figure 5.11: Architecture of the CT-to-DT converter.
As a result, the amount of amplitude change between two successive events in a CT digital signal
is limited to a certain range, which can be estimated by




In this design, the output resolution is 16-bit, AFS = 215 LSB, TD is 1 µs, and tg is 25 ns. This
results in a ∆V smaller than 211.3 LSB. The parameters in the above equation are unchanged if the
event detector is programmed to have a resolution lower than 16-bit, as is the absolute value of
∆V . Knowing that, we can convert the binary CT digital signal into a thermometer signal before
sampling. Because ∆V never reaches full scale, only a subset of all the thermometer bits can change
at any time. The three most significant bits are hence immune to sampling error. Fig. 5.11 shows
the implementation of the CT-to-DT converter. To reach a comprise between accuracy and the
hardware cost, only the eight most significant bits of the CT signal are converted into thermometer
code. After sampling, the thermometer code is then converted back into binary.
Chapter 6
Measurement Results of the CT Digital IIR
Filter System
This chapter presents measurement results of the CT digital IIR filter system. Unless otherwise
noted, the nominal test condition uses a supply voltage of 1.2 V. The results are collected at room
temperature. The test chip is fabricated in a 65 nm TSMC LP CMOS process. The die photo is
shown in Fig. 6.1. The design occupies an active area of 0.64 mm2.
6.1 Description of the measurement setup
The setup for measurements is shown in Fig. 6.2. An analog input is converted in to a CT digital
signal either by a CT-ADC or by a DT-ADC following by a DT-to-CT conversion (which will be



























Figure 6.2: Measurement setup.
is acquired by a digital data acquisition device with a 500 MHz sampling frequency. After that, the
acquired digital data are postprocessed in a computer.
The setup is composed of four PCBs. Fig. 6.3 shows a photograph of it. The board on the right
implements a LCS CT-ADC [3], which converts an analog input into a CT digital signal. The chip
under test (CUT) sits on the left board. On the same board, there is a DT-ADC. The two small
boards on the bottom generate the configuration bits which will be stored in the CUT’s scan chain.
They also generate the system-reset signal.
Ideally, we would use a CT digital signal generated from the CT-ADC as the system input for all
measurements. However, because it is built with discrete components with long propagation delays,
the CT-ADC can only accept a full-scale input up to 5 kHz. To fully measure the performance of
the CT digital IIR filter, we need a better signal source. The DT-ADC serves this purpose. We use
a Texas Instrument® ADC08L060 for the DT-ADC. It accepts a maximum sampling frequency of
60 MHz. Although the original output of a DT-ADC is DT digital samples, we can convert it into
a CT digital signal with a zero-order hold block. The converted output is still composed of binary
waveforms, but they are defined in CT: a CT digital signal. Actually, it is not necessary to introduce
84
CT ADC BoardDT ADC
Test chip
Figure 6.3: PCB boards for test.
an explicit zero-order hold block. The digital output of a CT-ADC naturally holds its value between
two clock edges, as shown in Fig. 6.4. The timing bit of a CT digital signal, req, is derived from
the clock. Feeding the clock into an one-shot circuit results in the required req. Whenever there is
a new data generated at the output of the DT-ADC, a pulse is also generated on req. The delay ∆1
determines the pulse width. The delay ∆2 determines the delay between a rising edge on the clock
and the rising edge on req. It should be long enough so that when req is pulled up, the data bits are


















Figure 6.4: DT to CT conversion. (a) Schematic. (b) Operations.
6.2 System configuration and calibration
Fig. 6.5 shows that the CT digital IIR filter system is composed of three blocks, a sixth-order CT
digital IIR filter, an interpolation filter and a CT-to-DT converter. All three blocks contain delay
elements that require bias voltages. In this design, the tg delay cells in the IIR filter’s second tap
delay share a bias voltage Vb2; the half-delay cells in the second tap delay and all FIFOs share a
bias voltage Vbhal f . All other delay elements, including the grouping block, the IIR filter’s first tap
delay, the delay cells in the interpolation filter and the CT-to-DT converters, share the same bias
voltage Vb1. Normally, these three bias voltages are connected together. By changing Vb1, Vb2 and
Vbhal f , we can change the delays of these delay elements.
The system can be configured through the scan chain. The system has two modes: normal op-
eration and test mode. In normal operation, we can configure the tap coefficients and the number
of delay cells in the two tap delays to change the CT digital IIR filter’s frequency response. The
86



























Figure 6.5: Block diagram of the chip under test.
number of FIR sections (0–4) in the interpolation filter and the number of delay cells in the tap
delay of each FIR section can also be configured. We can change the notch locations in the fre-
quency response of the interpolation filter through these configurations. Finally, we can choose
either to use the binary-to-thermometer conversion or not in the CT-to-DT converter. A sel signal
from outside the chip can enable or disable the CT-to-DT converter and choose the system output.
In the test mode, we can select any delay element in the system and measure its delay value.
6.2.1 Calibration of delay lines
We designed the nominal values of the tap delay, TD, and the smaller delay cell, tg, to be 1 µs and
25 ns, respectively. To avoid systematic mismatches between the two tap delays in the CT digital
IIR filter, we include 39 delay cells in the first tap delay and 40 delay cells in the second tap delay.
One less cell used in the first tap delay compensates for the delay introduced in the grouping block,
which is also 25 ns.
The delay lines in the CT digital IIR filter system can be tuned using an automatic tuning
87
TD ½TD ⅛TD ½TD16¼TDTD
Delay 
cells





Figure 6.6: Example setup of automatic tuning of the delay lines.
mechanism [18, 40]. Fig. 6.6 shows an example of the automatic tuning setup. The bias voltages
of all delay elements in the IIR filter, the interpolation filter and the CT-to-DT converter are tied to
the same node, the voltage of which is provided by a delay lock loop (DLL). The DLL contains a
TD delay line configured as an oscillator. The cycle time of the oscillation is compared to a 1 MHz
reference clock. When the oscillation cycle time equals to 1 µs, the DLL locks and the voltage on its
tune terminal is the desired bias voltage for the other delay lines. Although we did not implement
an automatic tuning system on our chip, below we describe a manual tuning that emulates this
automatic tuning.
We configure the delay elements in the CT digital IIR filter as shown in Fig. 6.7. The grouping
block and the first tap delay are configured as a closed loop, acting like the oscillator in the DLL
in Fig. 6.6. The second tap delay is configured as an open loop. It takes the output from Tap1 as
its input. At the beginning of the calibration, we feed a single pulse into the system through reqin.
Since the system contains a close loop, the pulse keeps going through the loop with a cycle time of



































Figure 6.7: Configuration for delay-line calibration.
need to make sure that the delay of the second tap delay is within (1 µs, 1.025 µs). This condition
guarantees that the events arriving at Tap2 are always within the grouping windows triggered by
their paired events arriving at Tap1 (please refer to Section 3.2 for the details). Since pulses are
very narrow and difficult to read out of the chip, reqin, Tap1 and Tap2 connect to a toggling flip-flop
(TFF), which converts pulses to edges.
The delays are measured with the aid of an oscilloscope. Fig. 6.8 shows an example of the
waveforms captured by the oscilloscope. The top, middle and bottom curves are the measurements
at probe1, probe2 and probe3 in Fig. 6.7, respectively. The loop delay is the timing distance be-
tween any two successive edges on the middle curve, which is calculated by the oscilloscope’s
internal clock. The value is displayed on the right-most column in Fig. 6.8. The delay of the sec-
ond tap delay is the timing distance between an edge on the middle curve and the following edge
with the same direction on the bottom curve. By tuning the three bias voltages (Vb1, Vb2 and Vbhal f ),
89
Figure 6.8: Delay-line measurements after tuning.
we can tune the delays to the desired values. Normally, all three bias voltages are connected to-
gether. We change the shared bias voltage to adjust the absolute delay values of the delay lines.
The system relies on the matching of the delay lines to meet the relative timing requirement. The
reason for keeping the three tuning voltages separate is to have the flexibility to tune the two delay
lines in the IIR filter separately in case matching is poorer than expected. In the measurement,
we find that when the bias voltages are connected together and equal to 0.627 V, the loop delay
and the delay of the second tap delay are 999.6 ns and 1009.6 ns. This satisfies the relative timing
requirement between the two delay lines. Thanks to the event grouping, as long as the relative
timing requirement is fulfilled, the frequency response of the IIR filter is insensitive to mismatches
between the two tap delays in the IIR filter. The response is only determined by the loop delay in
90
Fig. 6.7 and the tap coefficients. It is impossible to tune the loop delay to exactly 1 µs. However,
this small difference (0.04%) has a negligible effect on the filter’s frequency response.
The delay lines in the interpolation filter also share the same bias voltage as the delay lines in
the IIR filter. The exact frequency locations of the notches implemented by the interpolation filter
depend on the absolute values of the delay lines in the interpolation filter. Their accuracies rely on
the matchings of the delay lines. On the other hand, the exact locations of these notches have a
negligible effect on the in-band (i.e., [0, 500 kHz]) frequency response of the entire system. This
is because all the notches are far from the in-band range. The in-band frequency response of the
interpolation filter is always close to 0 dB.
6.2.2 Timing-dependent delay
The results reported in this chapter were collected in two measurements. The first measurement
was done from July to August in 2016; the second was in December 2016. A timing-dependent
delay was observed in the second measurement. After a delay line was calibrated using the above
procedure, the exact delay amount of the delay line varied with the time. For example, we calibrated
the second delay line in the IIR filter to have an initial delay value of 1.016 µs. Then we fed dense
input events into the delay line. The input events had a uniform event spacing of 40 ns, which is
slightly larger than the granularity of the delay line. Fig. 6.9 plots the measured delay value versus
the index of the event. As we can see, later events see shorter delays than the early events. The delay
value finally becomes constant with a value around 1.000 µs. This phenomenon was only observed























Figure 6.9: Event delay versus event index.
weren’t able to find the reason of it; it is possible that some sort of changes occurred in-between
the two sets of measurements. To tackle this timing-dependent delay problem in the measurement,
during calibration, we set the values of the delay lines to be slightly longer than the desired values.
This assured that the delay line can have the desired delay values during normal operation.
6.3 Frequency response
The CT digital IIR filter’s frequency response is measured after the above calibrations are com-
pleted. The CT digital input signal is generated from the DT-ADC with the DT-to-CT conversion.
The input event rate, determined by the ADC’s sampling clock frequency, is 20 MHz. A full-
scale analog signal is supplied at the input of the DT-ADC. Its frequency is swept from 10 kHz to
980 kHz. Two different digital outputs are collected: one from the output of the interpolation filter,
the other from the output of the CT-to-DT converter (see Fig. 6.5). The signal at the output of the
92
interpolation filter is CT digital. A data-acquisition device with a 500 MHz sampling frequency
is used to acquire the signal. Because the sampling frequency is much higher than the bandwidth
where the majority of the signal power is located (980 kHz), the acquired DT digital signal is a very
good representation of the original CT digital signal (the reason is given in Section 2.3). As an ex-
ample, in Fig. 6.10(a), we show an acquired output signal when the analog input signal is 55 kHz.
We do postprocessing in the computer to convert the digital data to their corresponding amplitude
values. The output of the CT-to-DT converter is a DT digital signal, which is synchronized with a
clock of 1 MHz (this clock is different from the sampling frequency of the DT-ADC). An example
of the acquired DT output is shown in Fig. 6.10(b). Both responses are found by FFT analysis with
a Hann Window [50]. The frequency responses of the CT digital IIR system when the IIR filter is
configured as low-pass, high-pass, bandpass and bandstop are plotted in Fig. 6.11. Once the output
is converted in a synchronous digital signal, the frequency response can only be measured up to
half of the clock frequency, i.e., 500 kHz. In the low-pass and bandpass cases, the first repetitive
passband around 1/TD is suppressed by the notches formed in the interpolation filter.
In the low-pass and high-pass frequency responses, the stopband rejection is better than 80 dB;
this is a 45 dB improvement over previous work [3,15], due to both the IIR response and the larger
number of internal bits retained. With the low-pass configuration, the SNDR for full-scale (1.6 Vpp)
input signals in the passband is 54 dB or above.
93










































Figure 6.10: Comparison of the CT and DT waveforms at the interpolation filter and the CT-to-DT
converter. (a) An example CT digital waveform acquired at the output of the interpolation filter.
The zoomed-in plot shows more details of the waveform. (b) An example DT digital waveform
acquired at the output of the CT-to-DT converter. Both waveforms are postprocessed by computer.
6.3.1 Two-tone test
The linearity of the designed filter is also tested with full-scale two-tone inputs. The in-band and
out-band IM3 components are −59 dB and −58 dB respectively. Fig. 6.12 shows the input and
output spectra of the filter with two out-band full-scale tones at 970 kHz and 980 kHz. These two
tones are in the repetitive passband of the IIR filters frequency response, but are attenuated by
the notches implemented in the interpolation filter. The output spectrum clearly demonstrates the
alias-free feature of the CT digital filter. No components are aliased back into the baseband. By
94
Frequency (kHz) Frequency (kHz)


























































0  Low pass as designed
 Measured at output
         of interpolation filter
 Measured at output
         of CT-to-DT converter







0  High pass as  
         designed
 Measured at output
      of interpolation filter
 Measured at output
      of CT-to-DT converter








 Bandpass as designed
 Measured at output
         of interpolation filter
 Measured at output
         of CT-to-DT converter









 Bandstop as designed
 Measured at output
      of interpolation filter
 Measured at output
      of CT-to-DT converter
Figure 6.11: Frequency responses. (a) Low pass. (b) High pass. (c) Bandpass. (d) Bandstop.
contrast, if the same input is fed into a DT digital filter with the same frequency response, full
aliasing will be observed.
6.3.2 Effect of the interpolation filter
The effect of the interpolation filter is better illustrated in Fig. 6.13. The gray curve shows the
spectrum at the input of the interpolation filter (i.e., the output of the CT digital IIR filter) when
the IIR filter is configured for low-pass operation as in Fig. 6.11(a). The testing input is a full-
95









































Figure 6.12: Out-band two-tone test. (a) Spectrum of the filter input. (b) Spectrum of the filter
output, before CT-to-DT converter.
scale 5 kHz sine wave sampled at 20 MHz. With such a high oversampling ratio, the resulting
DT digital signal has a spectrum very similar to a CT digital signal (see Section 2.3). Noise and
distortion power are preserved at the repeating passbands, which are around 1 MHz, 2 MHz, . . . .
The black curve shows the spectrum at the output of the interpolation filter. Clearly, the noise
and distortion power are suppressed by the notches formed in the interpolation filter. Conducting
FFT analysis on both signals, we find the SNDRs at the interpolator input and output from 0 to
250 MHz (the sampling frequency of the data acquisition device is 500 MHz) are 47.9 dB and
56.5 dB respectively. The interpolation filter increases the SNDR by 9.6 dB.
6.4 Power consumption
We measured the CT digital IIR filter system’s power consumption under different input event

























 Interpolation filter in spectrum
 Interpolation filter out spectrum
Figure 6.13: Spectra at input and output of the interpolating filter.
in Fig. 6.11(a). The input event rate is varied by sweeping the sampling clock frequency of the
DT-ADC from DC to 20 MHz. The power consumption of the system versus the input event rate is
shown in Fig. 6.14. When the event detector is on, it eliminates redundant events in the feedback
loops. Power strongly decreases when input activity decreases. When the event detector is off,
redundant events keep going around the feedback loops and maintain a high event rate regardless
of the input. On the other hand, when the input rate is low and the event detector is off, the data
values in the system are less likely to change. Hence, the power is still weakly dependent on the
input. The power consumption of the system when the input event rate is 20 Msps is broken down
in Table 6.1.
6.4.1 Test with a speech signal
We also tested the CT digital IIR system with a speech signal. The speech signal is converted to
a CT digital signal by the LCS CT-ADC on the separate board. Events on the CT digital signal
























 Event detector ON
 Event detector OFF
Dynamic power @  
20MHz input data rate (mW) 
2.32 
IIR filter  1.62  
     Delay line 0.41  
     SRAM 0.82  
     Arithmetic blocks 0.40 
Interpolation filter &  
CT-to-DT converter 
0.70  
































Figure 6.14: Power consumption versus input event rate.
digital IIR filter is configured as a low-pass filter with the same frequency response as it is in Fig.
6.11(a). The event detector is turned on, and 14 bits are used for data comparisons in the event
detector. The speech signal and the system’s instantaneous power consumption are shown in Fig.
6.15. When the input is active, the power can go high to 600 µW. When the input is static, the DSP
settles to a quiescent state with redundant events eliminated by the event detector, dropping the
power to the leakage level (40 µW). The average power is 83 µW.
Different from the CT digital FIR filters whose power consumption is determined only by
the input activity [3, 15, 17], the power consumption of a CT digital IIR filter is determined by
both the input activity and the activity in the filter’s feedback loops. This explains why the power
























 Event detector ON
 Event detector OFF
Dynamic power @  
20MHz input data rate (mW) 
2.32 
IIR filter  1.62  
     Delay line 0.41  
     SRAM 0.82  
     Arithmetic blocks 0.40 
Interpolation filter &  
CT-to-DT converter 
0.70  
































Table 6.1: Power breakdown.
6.5 Performance summary and comparison to other work
The parameters of the CT digital IIR system are summarized in Table 6.2. The delay line is only
a small portion of the system, both in area and power. The delay line area per tap is one third that
in [3].
Table 6.3 compares the presented work to CT and DT digital processors in prior art [3, 15, 17,






where fmax is the maximum input event rate, N is the filter order, and ENOB = SNDR−1.766.02 . One
issue with FoM1 is that it treats the filter order for FIR and IIR filters in a same way. However, it
is well known that to satisfy a given filtering requirement, IIR filters generally require much lower
order than FIR filters [54]. Hence, treating their order in a same way results in a figure of merit
























 Event detector ON
 Event detector OFF
Dynamic power @  
20MHz input data rate (mW) 
2.32 
IIR filter  1.62  
     Delay line 0.41  
     SRAM 0.82  
     Arithmetic blocks 0.40 
Interpolation filter &  
CT-to-DT converter 
0.70  







































When the filter is a FIR filter, K = 1 and FoM2 = FoM1; if it is instead an IIR filter, K is the ratio
of the orders of a FIR filter and an IIR filter when the two satisfy the same filtering requirement.
For example, to implement the low-pass frequency response plotted in Fig. 6.11(a), which has
TD = 1µs, passband edge at 50kHz, stopband edge at 88kHz and stopband rejection more than
80dB, the minimum orders required by a FIR filter and an IIR filter are 67th and 6th respectively.
Hence, K = 676 = 11.2.
Table 6.3 shows that FoM2 of our system beats all the CT digital designs in the prior arts
including the state-of-the-art design in Ref. [17]. In addition, for the first time, it implements an
IIR architecture, signal-derived timing, and synchronous output. The system’s 16-bit resolution is
100
Process/supply voltage 65 nm/1.2 V 
Input resolution 7 bits 
Filter coefficient resolution 10 bits 
DSP arithmetic/output res. 16 bits 
TD/tg 1 μs/25 ns  
Max. FIFO depth 128 
Max. input data rate 20 MHz 
SNDR* 57.7 dB 
Core area  0.64 mm2 
IIR filter 0.38 mm2 
     Delay line 0.05 mm2 
  Interpolation filter 0.18 mm2 
  CT-to-DT converter 0.08 mm2 
*Measured at output of the CT-to-DT converter,
when the input is a full-scale 5kHz 7b sine wave
and the IIR filter is low pass with TD=1us and a
cutoff frequency of 50kHz.
Table 6.2: Summary of the chip.
much higher than the others [3, 15, 17, 51]. Its FoM2 also compares well with those of DT digital
filters, but it avoids the clock and hence is free of aliasing. It features an agile power adaptation
to input activity, which varies from 2.32 mW (full activity) to 40 µW (idle), with more than a 50×
range, with no power-down circuitry.
101
 
Parameter O'hAnnaidh Tohidian Agarwal Schell Kurchuk Vezyrtzis This work 










Analog Analog Sync/ 
2.1 GHz 
Async. Async. Async. Sync/ 
1 MHz* 
Type/order FIR/15th IIR/7th FIR&IIR/3rd FIR/15th FIR/5th FIR/15th IIR/6th 
Process 45 nm 65 nm 32 nm 90 nm 65 nm 130 nm 65 nm 
Supply 1.1 V 1.2 V 1 V 1 V 1.2 V 1 V 1.2 V 
DSP arithmetic 
resolution 
6 bits 8 bits 8 bits 8 bits 4 bits 8 bits 16 bits 
Stop-band 
rejection 
22 dB >100 dB NA 35 dB 15 dB 25 dB 80 dB 
SNDR 33 dB 41 dB 50 dB 47-62 dB 20-22 dB 40-51 dB >54 dB 
Max. input rate 3.2 Gsps 800 Msps 2.1 Gsps 16.7 Msps 45 Gsps 10 Msps 20 Msps 






















1.2 fJ/  
12 fJ 
0.1 fJ/  
 4.3 fJ 
1.6 fJ/ 
 76 fJ 










1.2 fJ/  
12 fJ 
0.1 fJ/  
 4.3 fJ 
1.6 fJ/ 
 76 fJ 





Yes Yes Yes No No No No 
*For CT-to-DT converter only.  **Not including clock generation and anti-aliasing filter.
†Varying with signal activity; without power down circuitry.
Table 6.3: Comparison of presented work with state-of-art CT and DT digital filters.
Chapter 7
Design Considerations for VR Digital Signal
Processing
7.1 Introduction
As already mentioned in Chapter 1, conventional fixed-rate ADC and DSP systems have to operate
at the Nyquist frequency (i.e., twice the input bandwidth) or faster to avoid aliasing. However,
for most real world signals, there are variations in the frequency content within the observation
interval [8]. In practice, a spectrum is obtained by applying FFT on a finite-length interval of a
signal. We define the width from zero to the maximum frequency which has a significant power as
the “local bandwidth.” For example, speech signals contain frequencies up to 10 kHz, but in most of
the time their “local bandwidth” is up to 3.5 kHz [1]. Extensive studies have been done on the ADC
side to take the advantage of a varying “local bandwidth.” We only list a few here [1, 8–11]. The
102
103
VR-ADCs implemented in these works acquire samples with a sampling rate changing according
to the “local bandwidth.” Thus, a good amount of power is saved due to a low average sampling
rate, while information is still accurately preserved.
On the other hand, the question as to how to process VR samples remains open. [9] devel-
ops an algorithm to extract features from an adaptively-sampled ECG signal. However, the tech-
niques used in this specialized application do not apply in the general case. [10] introduces an
integrated solution to do VR sampling and filtering. However, it requires a complicated algorithm
to determine the clock rate, which adds a large power overhead. Hence, there is a need for an
energy-efficient DSP which can process samples with a variable sampling rate. Its clock rate has
to track the input sampling rate to maintain a low average activity, while its transfer function and
signal-to-noise ratio should be independent of the sampling rate.
A conventional DSP does not satisfy the above requirements. Fig. 7.1 shows a VR-ADC fol-
lowed by a conventional DSP. Since the focus of this work is on the DSP side, the details of the
ADC are omitted for simplicity. Only a sampler with a VR clock fs is shown. The DSP is im-
plemented as a Kth order FIR filter with tap coefficients h[k], k = 0, . . . ,K. Its tap delay Ts and
the sampling period are completely linked, and in fact they are equal, and this is at the root of
the problem: depending on the input “local bandwidth,” fs varies and thus the frequency response
scales with the sampling frequency.



































fDSP = fs = 1/T0 fDSP = fs = M/T0(a)
(b) (c)
Figure 7.2: Variations on a VR-DSP. (a) The basic model. (b) In a constant-sampling-rate region
with fs = fDSP = 1/T0. (c) In a constant-sampling-rate region with fs = fDSP = M/T0.
7.2 VR-DSP
The solution to the problem just mentioned is to decouple the tap delay from the sampling clock.
CT-DSPs implemented in [15] and the previous several chapters of this thesis have this feature.
105
This work discusses a way to do this in DT. We introduce a VR-DSP shown in Fig. 7.2(a). The
clock rate of the VR-DSP is denoted by fDSP. It is equal to the sampling clock fs for most of the
time, with some exceptions which will be discussed in Section 7.2.2. Its tap delay T0 is independent





We now consider several special cases.
7.2.1 Constant-sampling-rate regions
A constant-sampling-rate region in a VR-DSP is defined as the case in which the input sampling
rate has been constant for a long time. In that case, all the samples stored in the delay line of the
DSP have the same timing distance. In this region, fDSP is equal to fs in Fig. 7.2(a). Fig. 7.2(b)
shows the case where these quantities are selected to be 1/T01. The operation in this case becomes
the same as that of a conventional DSP. Because T0 is equal to one clock cycle, the VR-DSP
configures its delay line such that each tap contains one delay cell. The delay cell realizes a delay
equal to one clock cycle. The output y[n] is computed from
y[n] = ∑
K
k=0 h[k]x[n− k]. (7.2)
The transfer function corresponding to this equation is identical to (7.1).
1How the sampling frequency is determined according to the local bandwidth is not a focus of this paper. Readers
interested in it can refer to [1, 8–10].
106
Fig. 7.2(c) shows the case when a fast sampling clock is used. Assume fs = fDSP = M/T0,
where M can be any integer larger than 1. Because T0 is equal to M clock cycles, a VR-DSP
configures its delay line such that each tap is composed of M delay cells. Each cell realizes a delay
equal to one sampling cycle. Since one input sample is generated in each cycle, each delay cell












which is again identical to (7.1). The key for a VR-DSP to implement a transfer function inde-
pendent of fs is that it can adaptively configure its delay line so that a constant tap delay T0 is
kept.
7.2.2 Sampling-rate-transition regions
A sampling-rate-transition region starts when fs changes. The time intervals between samples be-
fore and after this change are different. Since both intervals will coexist in the delay line for a cer-
tain amount of time, the VR-DSP needs to properly configure its delay line to maintain a constant
tap delay T0, as well as to avoid distortion. In the following discussion, we assume the sampling
107
rate switches between two options 1/T0 and M/T0 for simplicity. The principles presented are also
valid for more sophisticated scenarios.
A fast-rate-to-slow-rate transition
Consider a VR-DSP operating in a constant-sampling-rate region with fs = M/T0. A fast-rate-to-
slow-rate transition starts when the DSP detects the sampling rate decreasing to 1/T0. However, the
DSP clock rate fDSP is not allowed to change immediately: because the minimum timing distance
between two successive samples in the delay line is T0/M, the clock has to maintain a T0/M
cycle time to preserve this information. Otherwise, if a slower clock is used, the inherent timing
information in the sample sequence will be distorted. In fact, this transition period is the only time
when fDSP is different from fs. Since the timing distance in the following samples is T0, which is
equal to M clock cycles, any two consecutive samples have to spaced by M delay cells. This leaves
vacant spots in-between. In each of the M− 1 cycles following the time fs switches, a dummy
sample equal to the previous input sample is used to fill in the vacant spot at the input of the delay
line. In the Mth cycle, a new input sample is generated. Thus, no dummy filling is needed. The
operations in these M cycles then repeat K− 1 times. In the end, the delay line of the VR-DSP
contains K new input samples, with M− 1 dummy samples inserted between any successive two
of them. Because all these K input samples are spaced by T0, the VR-DSP decreases fDSP to 1/T0.
Following this, it reconfigures the delay line to have one delay cell in each tap and discards all the
dummy samples. This does not affect the timing distance between the samples because each delay
108
cell realizes a delay of T0. From then on, the VR-DSP enters a constant-sampling-rate region with
fs = fDSP = 1/T0.
A slow-rate-to-fast-rate transition
A slow-rate-to-fast-rate transition occurs when a VR-DSP detects that fs increases from 1/T0 to
M/T0. The procedure is just the opposite of the fast-rate-to-slow-rate transition. Originally, each
tap contains one delay cell and stores one sample. As soon as the change is detected, each tap is
reconfigured to contain M delay cells. At the same moment, each existing sample is copied M−1
times to fill up the vacant spots in the tap it belongs to. In the meantime, fDSP increases to M/T0 to
be consistent with fs. From then on, the VR-DSP operates with a fast clock rate of M/T0.
To summarize the operations in sampling-rate-transition regions, we emphasize that as long as
there are any samples acquired with a fast sampling rate in the delay line, the VR-DSP operates at
the same fast rate to preserve the timing information in the samples. A VR-DSP with more than
two clock rate options would have to operate with the highest rate at which the samples in the delay
line are acquired.
7.2.3 An illustrative example
As an illustration, we consider an 80 ms input signal. As indicated at the top of Fig. 7.3, before
20 ms and after 40 ms, the input frequency is 4.1 kHz and fs = 50kHz. The signal in-between has a
frequency of 16.4 kHz and fs = 200kHz. It is not necessary to use those high sampling frequencies
in VR-DSPs; we choose this combination for illustration purpose as, within a short duration, a large
109
amount of samples are acquired, allowing the details to become visible. The waveforms around the
two transition instants are shown in Fig. 7.3 (a) and (b). Because we know the exact instants when
the input “local bandwidth” switches, we can make them coincident with the sampling frequency
switching instants to simplify our discussion. Note that the details of how the sampling rate is
switched in the input ADC is not the subject of this paper, the reader is referred to [1, 8–10]. The
VR-DSP is implemented as a 10th order low-pass FIR filter with a cutoff frequency at 18 kHz.
Several filtered outputs are shown in Fig. 7.3 (c)-(f). Ideal outputs are shown by solid lines; these
are generated by feeding the input signal into a CT filter with the same transfer function. (c)
and (d) plot the output samples in the slow-rate-to-fast-rate and fast-rate-to-slow-rate transitions,
respectively. In the parts far from the transition instants, the output samples from the VR-DSP
match the ideal curve very well. Deviations from the ideal curve are observed in the transition
parts; error is introduced by dummy samples. Dummy samples are not directly acquired from
the input signal, but are reconstructed from the previous input sample, as discussed above. This
introduces an error between the sample sequence and the input signal.
We also consider a fixed-rate DSP with a clock rate of 200 kHz for comparison. Its output
samples in two transient periods are plotted in (e) and (f). Although a fixed-rate DSP generates
better results in the transient regions, a VR-DSP has a much lower average output sample rate.
In this case, the average sample rate of the VR-DSP is 2.3× lower than the fixed-rate DSP, which
implies significant savings of computations and thus of power consumption. In general, the average
sample rate of a VR-DSP is input dependent. A shorter duration of the high-frequency part of the
input results in a lower average sample rate. For example if the length of the 16.4 kHz part in the
110






































































































t (ms) t (ms)
t (ms) t (ms)

























































t (ms)(a) t (ms)
t (ms)(c) t (ms)(d)












































t (ms)(a) t (ms)
t (ms)(c) t (ms)(d)













































t (ms)(a) t (ms)
t (ms)(c) t (ms)(d)

































fs=200 kHz fin=4.1 kHz
fs=50 kHz



























































Figure 7.3: Sample input and output signals around frequency transitions. (a) Low to high fre-
quency. (b) High to low frequency. In (c)–(f), the solid line shows the ideal waveform for reference;
the dots represent the output samples from different DSPs. (c) A slow-rate-to-fast-rate transition in
a VR-DSP. Both fs and input frequency change at t = 20ms. (d) A fast-rate-to-slow-rate transition
in the VR-DSP. Both fs and input frequency change at t = 40ms. However, the output rate doesn’t
change until the end of the transition, which is at about t = 40.2ms. (e) A slow-input-to-fast-input
transition in a fixed-rate DSP. (f) A fast-to-slow transition in a fixed-rate DSP.
example above shrinks from 20 ms to 5 ms, the output rate becomes 3.3× lower than the fixed-rate
DSP.
7.2.4 Remarks on required rate
A VR-DSP with a transfer function described in equation (7.1) has a repetitive frequency response,
with a period of 1/T0. Within each frequency response period, the frequency responses in the two
half periods are symmetrical. Thus, the maximum input bandwidth that the DSP can process is
1/2T0. On the other hand, since T0 is the unit delay in (7.1), fs = 1/T0 is the lowest sampling rate
one can use in the ADC. Any sampling rate lower than 1/T0 will cause the repetition of the fre-
quency response to start earlier than 1/T0. Fig. 7.4(a) shows the case when the “local bandwidth”
111
of an input is low. Because of the frequency response constraint, the minimum sampling rate has
to be 1/T0. Fig. 7.4(b) shows the case when the “local bandwidth” reaches the maximum, i.e.,
1/2T0; this demands a higher sampling rate, M/T0, with M an integer larger than 1. This implies
an oversampling ratio of M. When quantization noise is considered, a sampling-frequency depen-
dent noise floor is introduced. Assume the samples have a fixed resolution. Quantization introduces
a noise with a total power of V 2LSB/12, where VLSB corresponds to the one least significant bit. Con-
sider Fig. 7.4(a), when the sampling frequency is low; the noise floor is high because the total noise
power is distributed within a narrow frequency interval [0,1/2T0]. When the sampling frequency
is high, as shown in Fig. 7.4(b), the noise floor is low as the same amount of power is distributed
over a wider range [0,M/2T0]. On the other hand, an input with a low “local bandwidth” can be
reconstructed by a reconstruction filter with a low cutoff frequency as shown in Fig. 7.4(a). Corre-
spondingly, when the “local bandwidth” of the input is high, a filter with a high cutoff frequency
should be used for reconstruction. By linking the cutoff frequency of the reconstruction filter with
the sampling frequency of the VR-DSP, we can make sure that a fixed amount of quantization noise
power is included in the passband of the reconstruction filter.
7.3 Reconstruction of VR samples
Signal reconstruction is needed in applications where a CT output is required. A CT signal or a
uniform-rate sample sequence which is reconstructed from a VR sample sequence allows the use
of the Fourier transform. This enables one to do frequency-domain analysis on the signal. In a













































Figure 7.4: DSP frequency response, noise power spectral density, and reconstruction filter fre-
quency response, for low and high sampling rates.
convolved with a uniform-rate sample sequence to generate the reconstructed output. One of the
commonly used reconstruction filters is a brick-wall low-pass filter, whose impulse response is a
sinc function [56]. An important feature of this function is when it reconstructs an output at one
of the sampling instants, its zero-crossing points are coincident with the positions of all the other
samples. Thus, the reconstructed output at that point is exactly equal to the sample value. However,
this feature is no longer valid when reconstructing a VR sample sequence, whose intersample
distance is nonuniform. This introduces error in the reconstructed output at the sampling instants.
In order to solve this problem, we modify the sinc function to track the sampling frequency at
the reconstruction instant as shown in Fig. 7.5. The stems on the time axis are the time stamps
of the samples. Magnitude information is not shown for simplicity. Initially the sampling rate is
M/T0. After t = ts, the sampling rate switches to 1/T0. The left and right sinc curves are used to











Figure 7.5: Reconstruction using sinc with a variable cutoff frequency at three different instants.
in each region, has an impulse response given by:
hr(t) = sinc(t fs) (7.4)
In this case, the cutoff frequency of the reconstruction filter is always equal to fs/2. As shown in
Fig. 7.5, no matter the sampling frequency is high or low, the distances between two zero-crossing
points in these sinc functions are equal to the intersample distances. In practice, only a finite num-
ber of samples in the vicinity of the reconstruction instant can participate in the operation. Thus, the
sinc function has to be truncated. However, jumps are observed at t close to the switching instant
ts, due to the sudden change of fs. To reconstruct an output around this transition region, we use a
modified approach. We scale the two parts of a sinc function at the two sides of t = ts with different
factors so that the distances between two zero-crossing points are the same as the interval-sample
intervals on each side of f = ts. The “sinc” function in the middle of Fig. 7.5 illustrates this.
To show the results of the reconstruction process, we reconstruct the output samples from the
VR-DSP in 7.2.3. We compare the reconstructed results both with and without the transition-region
modification. In both cases, 101 samples participate in the reconstruction. The results are shown
114
in Fig. 7.6. A solid line shows the output with the modification, while a dash line is without the
modification. The dashed line has large error around the sampling rate switching instant at 20 ms.
A zoomed-in plot around that instant is provided to clearly show the discontinuity in the output.
On the other hand, the reconstructed output shown as the solid line coincides with the samples at
all the sampling instants.
Next we show the reconstructed output against the ideal output in Fig. 7.7. In the constant-
sampling-rate regions close to the two ends, the two waveforms match very well. In the sampling-
rate-transition regions, an error is obvious. It is introduced by the dummy samples in the VR-DSP
operation.
Finally, we consider quantization noise and assume the resolution of the input samples is eight
bits. We built a reconstruction filter using the method described above but chose its cutoff frequency
to be fs/8. In the high-sampling-rate case in Section 7.2.3, the cutoff frequency is coincident with
the edge of the baseband. We used this filter to reconstruct the output of the VR-DSP. The baseband
SNR in the constant-sampling-rate regions are summarized in Table 7.1. We also reconstructed the
samples from the fixed-rate DSP and listed its SNR for reference. As shown in the table, the SNR
of the VR-DSP is almost a constant for different fs and it is very close to the value in fixed-rate
DSP case.
Although the sampling rate in the above example varies only once, the method introduced also
applies to cases with multiple changes, provided that the time distances between two changes are
apart by at least one transient interval.
115



































































































































e O t ithout transition-
n modif cation 
O t with transitio -
ion modif cation 
Figure 7.6: Comparison of two reconstruction methods.






























































Figure 7.7: Comparison of the reconstructed and the ideal output.
Table 7.1: In-band SNR comparison
VR-DSP Fixed-Rate DSP
Sampling Freq. 200 kHz 50 kHz 200 kHz 200 kHz
Input Freq. 12.8 kHz 4.1 kHz 12.8 kHz 4.1 kHz
SNR 57.8 dB 57.5 dB 57.7 dB 58.7 dB
7.4 Conclusion
A VR-DSP which can process samples with a VR according to the input local bandwidth is de-
scribed. We also provide a method for accurately reconstructing the output from the output sam-
ples. Design considerations and limitations of these systems are discussed in the paper. Simulation
116
results confirm that a VR-DSP can process signals with a significantly lower average rate compared
to fixed-rate DSPs, while maintain a comparable SNR in the output.
Chapter 8
Suggestions for Future Work
This chapter suggests some avenues for improvement to the work presented in this thesis.
8.1 Improvement to event detection in the CT digital IIR filter
As we discussed in Section 3.3.1 and 5.2.3, an event detector is used to monitor the traffic in the
feedback paths of a CT digital IIR filter to eliminate redundant events. However, introducing a
detector occasionally breaks the assumption of the event-grouping method, that an event arriving
at the end of second tap delay can always be grouped with its paired event arriving at the end of
first tap delay. An event may arrive at the end of second tap delay while its paired event in the first
tap delay is eliminated by the event detector. In this case, our current solution is to throw away the
event at the second tap delay. However, this adds error power to the signals. An improvement to the
event detection which can avoid introducing this error is desired. A possible solution is shown in













































Figure 8.1: A possible implementation of event detection with no error power.
to one each in the original case in Fig. 3.6. The new event detector monitors all the input data which
will be written into the six FIFOs and control the inputs of the two delay lines in the timing block.
If the event detector detects that an event trying to write to the top three FIFOs is redundant and its
paired event trying to write to the bottom three FIFOs is also redundant, the detector eliminates both
events from the loop. As long as either of this pair of events is not redundant, both are preserved.
8.2 A CT digital IIR filter with one tap delay
In Chapter 3, we introduced a method that allows one to design a high-order CT digital IIR filter
using two tap delays and showed that mismatches between the two delay lines can cause the IIR
filter to be unstable. Although an event-grouping method was introduced to address this issue, it
still requires the matching of the two delay lines to be relatively good. In this section, we improve
the method so that one can design a high-order CT digital IIR filter using only one tap delay. A
CT digital IIR filter based on this new method is completely free of the matching issue. Consider






















Figure 8.2: A shared timing block. (a) Shared timing block. (b) Operations.
M block is used to replace the grouper in the original plot. M stands for merging block with a
zero delay. When an event (here denoted as a pulse on the timing signals) arrives at any of the
merging block’s three inputs (reqin, reqTap1 or reqTap2), the block generates a pulse at its output,
reqout, immediately. Fig. 8.2(b) shows the operations of the timing block after Pulse A enters. The
merging block immediately generates Pulse B at reqout. Pulse B enters the top delay line and results
Pulse C at reqTap1 after a TD delay. Pulse C does two things: 1) It enters the merging block and
generates Pulse D at reqout immediately. 2) It also enters the bottom delay line and generates Pulse
E at reqTap2 after a TD delay. When Pulse E is generated at reqTap2, Pulse F is generated at reqTap1
at the same time. Pulse F is the outcome of Pulse D passing through the top delay line. Pulses E
and F both enter the merging block and result in Pulse G. Every TD after that, a pulse is generated
at reqout, reqTap1 and reqTap2 at the same time.
From Fig. 8.2, it is clear that, except for Pulse C, all the pulses at reqTap1 coincide with pulses






















Figure 8.3: A two-bit TD delay line. (a) Timing block. (b) Operations.
reqTap1 and reqTap2. The answer is “No” if one can implement a two-bit delay line as shown in
Fig. 8.3(a). The two-bit delay line has two inputs and two outputs. If a pulse enters through reqi1,
the delay line generates a pulse at reqo1 after a TD delay. If a pulse enters through reqi2, the delay
line generates a pulse at reqo2 after a TD delay. We configure the two-bit TD delay line in a loop as
shown in Fig. 8.3(a). Fig. 8.3(b) shows its operations after Pulse A enters the reqi1 terminal. After
a TD delay, Pulse B is generated at reqo1. Pulse B goes into the merging block and immediately
generates Pulse C at reqi2. After another TD delay, the delay line generates Pulse D at reqo2. Pulse
D also goes into the merging block and immediately generates Pulse E at reqi2. Every TD after that,
pulses are simultaneously generated at reqi2 and reqo2. Comparing Fig. 8.2(b) and 8.3(b), one can
see that merging the pulses at reqin, reqo1 and reqo2 in Fig. 8.3(b) generates a signal equivalent to
reqout in Fig. 8.2(b), and that merging the pulses at reqo1 and reqo2 in Fig. 8.3(b) generates a signal
equivalent to reqTap1 in 8.2(b). Finally, reqo2 in Fig. 8.3(b) is equivalent to reqTap2 in Fig. 8.2(b).
Later, we will show that the merging operation can be implemented very simply. It is clear now
121
that all the timing signals in Fig. 8.2(b) can be derived from the timing signals in Fig. 8.3(b), yet
the structure in Fig. 8.3(b) uses one less delay line.
8.2.1 Implementation of two-bit TD delay line
The functions of a two-bit TD delay line are described above. Of course, one can implement the
functions using two parallel one-bit TD delay lines. This is, unfortunately, not a good way because
the implementation suffers from the same matching issue and saves no hardware compared to the
original design. The implementation we propose in this section promises to be free of the matching
issue and to save half of the hardware compared to the original timing block in Fig. 8.2(a). Similar
to a one-bit delay line, a two-bit delay line is composed of a cascade of two-bit tg delay cells so
that it can delay events with a granularity of tg (Fig. 8.4(a)). Each two-bit tg delay cell has two four-
phase handshaking communication channels at its input and its output. Output Channel 1 (reqo1,
acko1) of the (k−1)th stage is connected to Input Channel 1 (reqi1, acki1) of the kth stage. Output
Channel 2 (reqo2, acko2) of the (k−1)th stage is connected to Input Channel 2 (reqi2, acki2) of the
kth stage. A two-bit tg delay cell has two functions: If an event arrives at its Input Channel 1, the
delay cell generates an event at its Output Channel 1 after a tg delay. If an event arrives at its Input
Channel 2, the cell generates an event at its Output Channel 2 after a tg delay.
The schematic of a two-bit tg delay cell is shown in Fig. 8.4(b). Initially all the C-elements’
outputs are zero. Q and Qb of the three SR latches are zero and high, respectively. When an event
enters the delay cell from Input Channel 1, reqi1 is pulled up. acki1 is pulled down accordingly to














































































Figure 8.4: A two-bit delay line composed of several delay cells for fine time granularity. (a)
Timing block. (b) schematic of one delay cell.
123
cell enters the delay phase. The high Q of SR1 represents that this ongoing delay operation is trig-
gered from Channel 1. The mechanism of generating a delay relies on a current source charging a
capacitor, which is exactly the same as in Fig. 4.5 (the split transistors in Fig. 4.5 are combined into
one in Fig. 8.4(b) for simplicity). After a tg delay, the output of the inverter is pulled high. Because
Q of SR1 is high, reqo1 is pulled high accordingly. An event is generated at Output Channel 1.
Once the next delay cell enters into the delay phase, acko1 is pulled down to reset both SR0 and
SR1. A C-element is used to generate reqo1. It ensures that only when both SR1 and the delaying
components (M1–M4 and C1) are completely reset will the reqo1 fall to zero. An event entering
the delay cell from Input Channel 2 triggers similar operations in the delay cell. The event sets
SR0 and SR2 in this case. The high Q of SR2 represents that the ongoing delay phase is triggered
from Channel 2. At the end of the delay operation, the event is passed into next delay cell through
Output Channel 2. The delay cell uses two independent input/output channels to propagate events
in the two channels separately. The key for this design is that the two channels share the same
delaying resource (M1–M4 and C1 in Fig. 8.4(b)), so the delays of the two channels always match.
When the delay cell is in a delay phase, it cannot accept another incoming event from either
of its two channels. This requires the input events of the delay cell on its two channels to be
separated by at least tg. This requirement can be fulfilled by using a grouping block at the very
beginning of the two-bit delay line, as shown in Fig. 8.5(a). The grouping block should group all
events that arrive at the inputs of the delay line within a tg timing window into one event. As with
a regular two-bit delay cell, the grouper also has two separate input and output channels. Their
connections with the first two-bit tg delay cell are also shown in Fig. 8.5(a). Normally, the two
124
channels in a two-bit delay line are symmetric. However, when it is used as a timing block for
an IIR filter as shown in Fig. 8.3(a), Channel 2 of the two-bit delay line is configured as a closed
loop. For an IIR filter, the loop delay of this closed loop determines the denominator of the filter’s
transfer function, which is critical to the system’s stability. Hence, it is important to keep the loop
delay event independent. This requires the grouping block to have the operations as plotted in Fig.
8.5(b)–(d).1 When an event arrives at the grouper’s reqi1, it triggers a timing window of length
tg, as shown in Fig. 8.5(b). Since no other events arrive during the window, an event is generated
at the grouper’s reqo1 at the end of the window. A similar operation occurs if an event arrives at
reqi2 and no other events arrive during the window. An event is generated at reqo2 after a tg delay.
Fig. 8.5(c) shows the case of an event arriving at reqi1 in the middle of a window triggered by a
previous event at reqi2. The grouper groups the two events and generates an output event at reqo2
at the end of the window. Fig. 8.5(d) shows the case of an event arriving at reqi2 in the middle of a
window triggered by a previous event at reqi1. The grouper extends the grouping window by a new
tg. At the end of the extended window, an event is generated at reqo2. By extending the grouping
window, the grouper guarantees that the delay from the arrival of an event at reqi2 to the generation
of the corresponding event at reqo2 is always tg.
Fig. 8.6 shows that the schematic of the grouping block is basically a simplified version of the
grouping block in the CT digital IIR filter in Fig. 5.6. The “feedforward channel” and “R1 channel”
in Fig. 5.6 are used as “Channel 1” and “Channel 2” in Fig. 8.6. Since the operations of the two
channels have been described in detail in Section 5.2.6, we do not repeat them here.










































Figure 8.5: Grouping block in a two-bit delay line. (a) Its input/output interfaces and their connec-



















































































2-bit TD delay line
Figure 8.7: A two-bit TD delay line is configured as the CT digital IIR filter’s timing block.
8.2.2 A two-bit TD delay line as the CT digital IIR filter’s timing block
Fig. 8.7 shows the detailed connections of Fig. 8.3(a), which configures a two-bit TD delay line as
the timing block of a CT digital IIR filter. The merging block combining the two inputs at reqi2 of
the grouper can be simply implemented with an OR gate. Two C-elements are used to generate the
corresponding acknowledge signals. As we described above, the signals that will be finally used to
control the data path of a CT digital IIR filter – reqout, reqTap1 and reqTap2 in Fig. 8.2 – can simply
be derived from the timing signals generated by the timing block in Fig. 8.7. They can be obtained
with two OR gates as shown in the rightmost delay cell of Fig. 8.7.
8.3 A complete signal processing chain with a VR sigma–delta
modulator and a VR-DSP
Chapter 7 introduces the VR-DSP. A complete signal-processing chain, including both a VR-ADC




































Figure 8.8: Two signal-processing chains compared. (a) Using a VR clock. (b) Using a fixed-rate
clock.
quency response independent of the instantaneous sampling frequency, the preceding ADC must
be oversampled. One naturally thinks of using a sigma–delta ADC. One possible implementation
of a signal-processing chain composed of a VR sigma–delta ADC and a VR-DSP is shown in
Fig. 8.8(a). According to the “local bandwidth” of the input signal, x(t), the VR clock adaptively
changes its instantaneous value. The VR-DSP, which operates at the same VR clock as the VR
sigma–delta ADC, can directly process the samples generated by the ADC, without going through
a decimation filter. At the end of the signal-processing chain, a reconstruction filter with a variable
cutoff frequency tracking the instantaneous sampling rate ensures a fixed signal-to-noise ratio as
we explained in Section 7.2.4.
As a comparison, we show a conventional signal-processing chain with fixed-rate sigma–delta
128
ADC and DSP in Fig. 8.8(b). Both the sigma–delta ADC and the decimation filter work at the
highest clock rate in the system, the base clock rate. After the decimation filter, the sample rate
decreases by a factor of OSR. Hence, both the DSP and the reconstruction filter work at a clock
rate that is 1/OSR as fast. Assume the base clock rates in the two systems are the same. Also
assume that the VR clock only changes between two clock rates: the base clock rate (when the
input has high-frequency contents) and 1/OSR of the base clock rate (when the input only has low-
frequency contents). Comparing the hardware and the average operation rates in the two systems
in Fig. 8.8, we find that the fixed-rate scheme has a higher operation rate in the ADC and a lower
operation rate in the DSP and the reconstruction filter than the average operation rate in the VR
scheme. However, the extra decimation filter, which operates at the highest clock rate in the fixed-
rate scheme, is not needed in the VR scheme. One advantage of the VR scheme, as we have shown
in Chapter 7, is that the operation rate depends on the input activity. If the input signal has low-
frequency contents most of the time, the average operation rate of the VR clock can be as low as
1/OSR of the base clock rate, which is close to the operation rate in the DSP and reconstruction
filter in the fixed-rate scheme. As a result, the VR scheme has big advantages in terms of power
and hardware consumptions benefiting from a much lower average operation rate in the ADC and
the avoidance of a decimation filter.
Bibliography
[1] A. Mostafa et al., “Adaptive sampling of speech signals,” IEEE Transactions on Communica-
tions, vol. 22, pp. 1189–1194, Sep. 1974.
[2] Y. Tsividis, “Continuous-time digital signal processing,” in Electronics Letters, vol. 39, no. 21,
pp. 1551–1552, 16 Oct. 2003.
[3] B. Schell and Y. Tsividis, “A continuous-time ADC/DSP/DAC system with no clock and with
activity-dependent power dissipation,” IEEE Journal of Solid-State Circuits, vol. 43, no. 11,
pp. 2472–2481, Nov. 2008.
[4] Leon W. Couch, Digital and Analog Communication Systems. Pearson; Jan. 19, 2012.
[5] M. Kurchuk and Y. Tsividis, “Signal-dependent variable-resolution quantization for
continuous-time digital signal processing,” IEEE International Symposium on Circuits and
Systems, Taipei, 2009, pp. 1109–1112.
[6] C. Weltin-Wu and Y. Tsividis, “An event-driven clockless level-crossing ADC with signal-




[7] P. Martnez-Nuevo, S. Patil and Y. Tsividis, “Derivative level-crossing sampling,” IEEE Trans-
actions on Circuits and Systems II: Express Briefs, vol. 62, no. 1, pp. 11–15, Jan. 2015.
[8] T. Kurp et al., “An adaptive sampling scheme for improved energy utilization in wireless sensor
networks,” 2010 IEEE Instrumentation and Measurement Technology Conference (I2MTC),
pp. 93–98, May 2010.
[9] H. Kim, C. Van Hoof, and R. Yazicioglu, “A mixed signal ECG processing platform with
an adaptive sampling ADC for portable monitoring applications,” 2011 Annual International
Conference of the IEEE Engineering in Medicine and Biology Society, EMBC, pp. 2196–2199,
Aug. 2011.
[10] S. M. Qaisar et al., “Computationally efficient adaptive rate sampling and filtering,” in EU-
SIPCO, vol. 7, pp. 2139–2143, 2007.
[11] W. Dieter et al., “Power reduction by varying sampling rate,” Proceedings of the 2005 Inter-
national Symposium on Low Power Electronics and Design, pp. 227–232, Aug 2005.
[12] J. Mark and T. Todd, “A nonuniform sampling approach to data compression,” IEEE Trans-
actions on Communications, vol. 29, no. 1, pp. 24–32, Jan. 1981.
[13] J. Foster and T. K. Wang, “Speech coding using time code modulation,” IEEE Proceedings
of Southeastcon ’91, Williamsburg, VA, 1991, vol. 2, pp. 861–863.
131
[14] N. Sayiner, H. V. Sorensen and T. R. Viswanathan, “A level-crossing sampling scheme for
A/D conversion,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal
Processing, vol. 43, no. 4, pp. 335–339, Apr. 1996.
[15] C. Vezyrtzis et al., “A flexible, event-driven digital filter with frequency response independent
of input sample rate,” IEEE Journal of Solid-State Circuits, vol. 49, pp. 2292–2304, Oct. 2014.
[16] Bob Schell and Yannis Tsividis. “Analysis and simulation of continuous-time digital signal
processors.” Signal Processing vol. 89, no. 10, pp. 2013–2026, 2009.
[17] M. Kurchuk, C. Weltin-Wu, D. Morche and Y. Tsividis, “Event-driven GHz-range
continuous-time digital signal processor with activity-dependent power dissipation,” IEEE
Journal of Solid-State Circuits, vol. 47, no. 9, pp. 2164–2173, Sept. 2012.
[18] B. Schell, Continuous-time digital signal processors: Analysis and implementation. Ph.D.
thesis, Department of Electrical Engineering, Columbia University, 2008.
[19] Y. Chen, M. Kurchuk, N. Thao, and Y. Tsividis, “Spectral analysis of continuous-time ADC
and DSP,” in Event-Based Control and Signal Processing, M. Miskowicz (Ed.), CRC Press,
2015.
[20] Y. Tsividis, M. Kurchuk, S. Nowick, B. Schell, and C. Vezyrtzis, “Event-based data acqui-
sition and digital signal processing in continuous time,” in Event-Based Control and Signal
Processing, M. Miskowicz (Ed.), CRC Press, 2015.
[21] A. Gersho and R. M. Gray, Vector quantization and signal compression. Springer, 1992.
132
[22] S. Haykin, Communication Systems. Wiley, 2001.
[23] H. E. Rowe, Signals and Noise in Communication Systems. Van Nostrand, 1965.
[24] N. M. Blachman, “The intermodulation and distortion due to quantization of sinusoids,” IEEE
Transactions on Acoustics, Speech and Signal Processing, vol. 33, pp. 1417–1426, Dec 1985.
[25] H. Pan and A. Abidi, “Spectral spurs due to quantization in Nyquist ADCs,” IEEE Transac-
tions on Circuits and Systems I: Regular Papers, vol. 51, pp. 1422–1439, Aug 2004.
[26] T. Claasen and A. Jongepier, “Model for the power spectral density of quantization noise,”
IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 29, no. 4, pp. 914–917,
1981.
[27] M. Kurchuk, Signal encoding and digital signal processing in continuous time. PhD thesis,
Columbia University, 2011.
[28] P. Z. Peebles, Communication System Principles. Addison-Wesley, Advanced Book Program,
1976.
[29] V. Balasubramanian, A. Heragu and C. Enz, “Analysis of ultralow-power asynchronous
ADCs,” in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (IS-
CAS), pp. 3593–3596, May 2010.
[30] W. R. Bennett, “Spectra of quantized signals,” Bell System Technical Journal, vol. 27, no. 3,
pp. 446–472, 1948.
133
[31] N. S. Jayant and P. Noll, Digital coding of waveforms: Principles and applications to speech
and video, ACM, 1984.
[32] D. Hand and M. S. Chen, “A non-uniform sampling ADC architecture with embedded
alias-free asynchronous filter,” Proceedings of the Global Communications Conference 2012,
pp. 3707–3712.
[33] Y. Tsividis, “Digital signal processing in continuous time: A possibility for avoiding aliasing
and reducing quantization error,” IEEE Transactions on Acoustics, Speech, and Signal Pro-
cessing, vol. 2, pp. ii–589–92, May 2004.
[34] J. Foster and T.-K. Wang, “Speech coding using time code modulation,” in IEEE Proceedings
of Southeastcon ’91, pp. 861–863, 1991.
[35] J. Sparso and S. Furber, Principles of Asynchronous Circuit Design. Kluwer Academic, 2002.
[36] Sanjit Kumar Mitra, Digital signal processing: A computer-based approach. McGraw-
Hill/Irwin, 2001.
[37] Y. Neuvo, Dong Cheng-Yu and S. Mitra, “Interpolated finite impulse response filters,” IEEE
Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 3, pp. 563–570, Jun.
1984.
[38] B. S. Chang, G. Kim and W. Kim, “A low voltage low power CMOS delay element,” Twenty-
first European Solid-State Circuits Conference, Lille, France, 1995, pp. 222–225.
134
[39] B. Schell and Y. Tsividis, “A low power tunable delay element suitable for asynchronous
delays of burst information,” IEEE Journal of Solid-State Circuits, vol. 43, no. 5, pp. 1227–
1234, May 2008.
[40] C. Vezyrtzis, Continuous-Time and Companding Digital Signal Processors Using Adaptivity
and Asynchronous Techniques. PhD thesis, Columbia University, 2011.
[41] To be published.
[42] E. Burlingame and R. Spencer, “An analog CMOS high-speed continuous-time FIR filter,”
Proceedings of the 26th European Solid-State Circuits Conference, Stockholm, Sweden, 2000,
pp. 288–291.
[43] David E. Muller, “Asynchronous logic and application to information processing.” Proceed-
ings of the Symposium on the Application of Switching Theory in Space Technology, 1963,
289–297.
[44] Y. Tsividis and C. McAndrew, Operation and Modeling of the MOS Transistor. Oxford Uni-
versity Press, 2011.
[45] M. Kurchuk and Y. Tsividis, “Energy-efficient asynchronous delay element with wide con-
trollability,” Proceedings of 2010 IEEE International Symposium on Circuits and Systems,
Paris, 2010, pp. 3837–3840.
[46] K. van Berkel, Handshake Circuits: An Asynchronous Architecture for VLSI Programming.
Cambridge University Press, 1993
135
[47] C. L. Seitz, “System timing,” Introduction to VLSI systems, 1980, pp. 218–262.
[48] D. Kim, G. Chen, M. Fojtik, M. Seok, D. Blaauw and D. Sylvester, “A 1.85 fW/bit ultra low
leakage 10T SRAM with speed compensation scheme,” 2011 IEEE International Symposium
of Circuits and Systems (ISCAS), Rio de Janeiro, 2011, pp. 69–72.
[49] R. Ginosar, “Metastability and synchronizers: A tutorial,” IEEE Design and Test of Comput-
ers, vol. 28, no. 5, pp. 23–35, Sept.–Oct. 2011.
[50] F. J. Harris, “On the use of windows for harmonic analysis with the discrete Fourier trans-
form,” Proceedings of the IEEE, vol. 66, no. 1, pp. 51–83, Jan. 1978.
[51] A. Agarwal et al., 11 A 320 mV-to-1.2 V on-die fine-grained reconfigurable fabric for
DSP/media accelerators in 32 nm CMOS,” ISSCC Digest, pp. 328–329, 2010.
[52] E. O’hAnnaidh, E. Rouat, S. Verhaeren, S. L. Tual and C. Garnier, “A 3.2GHz-sample-rate
800mHz bandwidth highly reconfigurable analog FIR filter in 45nm CMOS,” ISSCC Digest,
pp. 90-91, 2010.
[53] M. Tohidian, I. Madadi and R. B. Staszewski, “A 2mW 800MS/s 7th-order discrete-time IIR
filter with 400kHz-to-30MHz BW and 100dB stop-band rejection in 65nm CMOS,” ISSCC
Digest, pp. 174-175, 2013.
[54] L. R. Rabiner, J. F. Kaiser, O. Herrmann and M. T. Dolan, ”Some comparisons between fir
and iir digital filters,” in The Bell System Technical Journal, vol. 53, no. 2, pp. 305-331, Feb.
1974.
136
[55] Y. Chen and Y. Tsividis, “Design considerations for variable-rate digital signal processing,”
2016 IEEE International Symposium on Circuits and Systems (ISCAS), Montreal, QC, 2016,
pp. 2479–2482.
[56] A. V. Oppenheim, R. W. Schafer and J. R. Buck, Discrete-Time Signal Processing. Prentice-




Stability Analysis of CT Digital IIR Filter in
the Laplace Domain
With conventional systems which only involve transfer functions that are rational in s, one can
determine their stability from the pole locations. As long as there is at least one pole located in the
right half plane of the Laplace domain, the corresponding systems are unstable. In fact, this relation
of a systems stability to its pole location is also valid when the system involves transfer functions
which contain exponential functions in s. This is because regardless of the types of functions in s,
a pole in the right half plane always corresponds to an unbounded impulse response. Hence pole
locations will be our primary tool to determine the stability of a CT digital filter.
In Section 3.2, we have shown that an originally stable IIR filter becomes unstable once delay
line mismatches are present. In this appendix, we will mathematically prove this result. We reuse
138
139





In order to judge the stability of the above system, we need to know the locations of its s-
domain poles. Although the system has an infinite number of poles, they correspond to a finite
number of esTD values which make the denominator of Eqn A.1 zero. From the modulus of these
esTD values, we can deduce the locations of the s-domain poles. If a system is unstable, it must have
at least one s-domain pole located in the right half plane of the Laplace domain. Meanwhile, esTD
can be expanded to be eαTD ∗ e jβTD . Because the s-domain pole in the right half plane contains a
positive real part and TD is a physical delay which must be positive, αTD > 0 and hence |eαTD|> 1.
Thus, the corresponding esTD value which makes the denominator of the system’s transfer function
zero must have a modulus larger than 1.
The only two values of esTD making the denominator zero are esTD = −0.75± j0.19. All the
values of s = α+ jβ which satisfies the equations are the poles of Eqn A.1. eαTD represents the
magnitudes of these poles. Since |−0.75± j0.19|= 0.77 < 1, α must be negative to let the poles’
modulus smaller than 1. Hence, all the s-domain poles of Eqn A.1 must have negative real parts
and hence are located in the left half plane in the Laplace domain. Such a system is stable.
When delay lines are not identical, as we show in Fig. A.1(b) where the top delay line is TD

















Figure A.1: Second-order IIR filters with b1 = −1.5, b2 = −0.6. (a) Identical delay lines. (b)
Nonidentical delay lines.











It is straightforward to find all values of e
sTD
10 which makes the denominator of Eqn A.3 zero.
Within all such values of e
sTD
10 , the one with the maximum modulus is e
sTD
10 =−1.01+ j0.32, whose
modulus is |e
sTD
10 |= 1.06. Its corresponding s-domain poles must have positive real parts. Since the
IIR system has at least one s-domain pole located in the right half plane, the system is unstable.
To better understand the effect of delay mismatch on system stability, we sweep τ from−0.5TD
to 0.5TD, with a step of 0.01TD. The stability of the filter in each case can be determined using the
above method. We find the maximum modulus of the values of e
sTD
100 which make the denominator
of the system transfer equal to zero. If it is larger than one, at least one s-domain pole of the system
is located in the right half plane and hence the system is unstable. We plot the maximum modulus
of the values of e
sTD


















Figure A.2: Max modulus of e
sTD
100 versus delay mismatch.
A.2. It is clear that the stability of the system does not change continuously with delay mismatch.
The system is stable only when delay lines are perfectly matched, i.e. τ = 0.
