A power-scalable variable-length analogue DFT processor for multi-standard wireless transceivers by Tanhaei, Ghazal
  
 
A Power-Scalable Variable-Length Analogue DFT 
Processor for Multi-Standard Wireless Transceivers 
 
 
By 
Ghazal Tanhaei 
 
 
A Thesis submitted to  
The University of Birmingham  
for the degree of  
DOCTOR OF PHILOSOPHY 
 
 
 
 
 
 
 
 
 
 
School of Electronic, Electrical and System Engineering  
College of Engineering and Physical Sciences 
The University of Birmingham 
September 2016 
 
  
 
 
 
 
 
 
 
 
 
University of Birmingham Research Archive 
 
e-theses repository 
 
 
This unpublished thesis/dissertation is copyright of the author and/or third 
parties. The intellectual property rights of the author or third parties in respect 
of this work are as defined by The Copyright Designs and Patents Act 1988 or 
as modified by any successor legislation.   
 
Any use made of information contained in this thesis/dissertation must be in 
accordance with that legislation and must be properly acknowledged.  Further 
distribution or reproduction in any format is prohibited without the permission 
of the copyright holder.  
 
 
 
  
 
 
 
 
 
 
 
Dedicated to my parents, Nasrin and Khosrow  
for their endless love, encouragements and sacrifices  
  
 
 
ABSTRACT 
Since the invention of the mobile phone, a new generation of mobile communication standard 
has emerged every 10 years. Upgrading the technology of mobile networks in all areas takes 
few years. Hence, mobile phones should support the previous communication standards as 
well as the latest standards. Realizing a multi-standard mobile phone by multiple transceivers 
in parallel is neither a size-efficient nor a cost-efficient solution. Hence, modern mobile 
phones demand reconfigurable transceivers. It is also essential for mobile phones to consume 
power efficiently. Hence, the multi-standard transceiver should scale its power consumption 
to the standard specifications. Many recent communication standards are based on the 
Orthogonal Frequency-Division Multiplexing (OFDM). In the OFDM based transceivers, 
digital computation of the Discrete Fourier Transform (DFT) is a power hungry process. 
Reduction in the hardware cost and power consumption is possible by implementing the DFT 
processor with analogue circuits. Accordingly, the goal of this work is to design a power-
scalable variable-length analogue DFT processor for multi-standard OFDM transceivers. 
Since the Fast Fourier Transform (FFT) algorithm reduces the computational burden of the 
DFT, it has been used to reduce the hardware cost and power consumption of the digital DFT 
processors for years. However, the FFT algorithm was originally designed for discrete-time 
signal processing. This thesis presents the real-time recursive DFT architecture, which was 
designed based on the characteristics of the analogue signal processing domain. The optimal 
architecture for the analogue DFT is achieved by keeping the signal continuous as long as 
possible. 
In order to analyse the performance of the proposed architecture, system-level simulations on 
the real-time recursive DFT processor and the radix-2 FFT processor of length 8 were 
performed. Results of the system performance analysis indicate that the average dynamic 
range of the proposed processor is 4.7 dB higher than the FFT processor. In the Monte Carlo 
analysis, the DFT processors that succeed in meeting the minimum dynamic range 
requirement (34dB) contribute to the yield. Accordingly, the proposed architecture has a yield 
of 99.3% while the yield of the FFT processor is 82.8%. 
  
 
 
The real-time recursive DFT architecture was realized by the four-quadrant transconductance 
multipliers and the parasitic-insensitive switched-capacitor integrators. The real-time 
recursive DFT processor was designed in 180 nm CMOS technology. Sensitivity of the real-
time recursive DFT processor to device mismatch was analysed using the Pelgrom’s model. 
Results of device mismatch analysis indicate that the 8-point recursive DFT processor has a 
yield of 97.5% for the BPSK modulated signal. For the QPSK modulated signal, however, 
yield of the 8-point recursive DFT processor is 8.9%. Moreover, doubling the transform 
length reduces the average dynamic range by 3dB. Accordingly, the 16-point recursive DFT 
processor has a yield of 43.4% for the BPSK modulated signal. Power consumption of the 
recursive DFT processor is about 1/6 of the power consumption of a previous analogue FFT 
processor.   
This thesis provided a proof-of-concept for the power-scalable variable-length analogue DFT 
processor. Previously, changing the transform length and scaling the power could only be 
performed by digital FFT processors. By using the real-time recursive DFT processor, the 
analogue decimation filter is eliminated. Thus, further reduction in the hardware cost and 
power consumption of the multi-standard transceiver is achieved.   
 
 
 
 
 
 
 
 
  
 
 
ACKNOWLEDGEMENTS 
First and foremost, I would like to thank my parents and my sister for their encouragements 
and financial support. My parents have been financially supporting me during the recession 
only because of their high value for education. None of this would have been possible 
without their love and support. 
I would like to express my sincere gratitude to my advisors, Dr Steven Quigley and Professor 
Peter Gardner, for their guidance and patience throughout my PhD study. Although they have 
provided me a peaceful atmosphere at the University, they gave me the freedom to work from 
home. 
Besides my advisors, I would like to thank Dr Kamyar Keikhosravy, postdoctoral fellow at 
the University of British Columbia, for sharing his expertise despite the distance. His modest 
personality makes him approachable. I am glad that this research has led to a great friendship.   
I would also like to thank Professor Costas Constantinou and Professor Khaled Hayatleh for 
their insightful comments and constructive criticisms.  
I am also indebted to all my teachers from elementary school to graduate school, especially 
my electronics professor at the Amirkabir University of Technology, Dr Saeed Khatami (Rest 
In Peace). In the early years of higher education, his encouragements inspired me with 
confidence.  
I wish also to express my gratitude to the University of Birmingham for awarding me the TI 
Group scholarship, which partially supported my research.  
 
 
 
  
 
 
CONTENTS 
CHAPTER 1 INTRODUCTION ......................................................................................................................... 1 
1.1 EVOLUTION OF COMMUNICATION SYSTEMS ........................................................................................................ 1 
1.2 STATEMENT OF THE PROBLEM.......................................................................................................................... 6 
1.3 DISSERTATION OBJECTIVES .............................................................................................................................. 8 
1.4 SIGNIFICANCE OF THE RESEARCH ...................................................................................................................... 8 
1.5 THESIS OUTLINE ............................................................................................................................................ 9 
CHAPTER 2 BACKGROUND STUDY AND LITERATURE REVIEW .................................................................... 11 
2.1 FUNDAMENTALS OF OFDM .......................................................................................................................... 11 
2.2 WIFI AND WIMAX PHYSICAL LAYER OVERVIEW ............................................................................................... 20 
2.3 STATE-OF-THE-ART FFT PROCESSORS ............................................................................................................. 22 
2.4 COMPARISON OF ANALOGUE AND DIGITAL SIGNAL PROCESSING ........................................................................... 22 
2.5 ANALOGUE FOURIER TRANSFORM ARCHITECTURES ............................................................................................ 24 
2.5.1 The Direct Form Finite Impulse Response ..................................................................................... 24 
2.5.2 The Fast Fourier Transform ........................................................................................................... 26 
2.6 SUMMARY ................................................................................................................................................. 28 
CHAPTER 3 REAL-TIME RECURSIVE DFT ARCHITECTURE............................................................................. 29 
3.1 REAL-TIME RECURSIVE DFT FOR DIGITAL SIGNAL .............................................................................................. 29 
3.2 REAL-TIME RECURSIVE DFT FOR ANALOGUE SIGNAL .......................................................................................... 32 
3.3 ADVANTAGES OF THE PROPOSED ARCHITECTURE ............................................................................................... 35 
3.4 SUMMARY ................................................................................................................................................. 37 
CHAPTER 4 SYSTEM PERFORMANCE ANALYSIS .......................................................................................... 38 
4.1 PERFORMANCE METRICS FOR DFT PROCESSOR ................................................................................................. 38 
  
 
 
4.2 PERFORMANCE REQUIREMENTS ..................................................................................................................... 40 
4.3 BEHAVIOURAL MODELLING ........................................................................................................................... 43 
4.3.1 Behavioural Model of the Multiplier ............................................................................................. 44 
4.3.2 Behavioural Model of the Integrator ............................................................................................ 46 
4.3.3 Behavioural Modelling of the FFT Processor ................................................................................. 48 
4.3.4 Behavioural Modelling of the Recursive DFT Processor ................................................................ 54 
4.4 DETERMINING THE DESIGN SPECIFICATIONS ...................................................................................................... 57 
4.4.1 Power Budget ................................................................................................................................ 57 
4.4.2 Design Specifications of the Multiplier.......................................................................................... 58 
4.4.3 Design Specifications of the Integrator ......................................................................................... 63 
4.5 YIELD PREDICTION ....................................................................................................................................... 66 
4.6 PERFORMANCE ANALYSIS RESULTS ................................................................................................................. 70 
4.7 SUMMARY ................................................................................................................................................. 73 
CHAPTER 5 CIRCUIT DESIGN ...................................................................................................................... 74 
5.1 PREVIOUS WORK ON THE ANALOGUE FFT PROCESSOR ....................................................................................... 74 
5.2 ANALOGUE MULTIPLIER ................................................................................................................................ 80 
5.2.1 Principle of Operation ................................................................................................................... 81 
5.2.2 Analysis of the CMOS Gilbert Cell ................................................................................................. 82 
5.2.3 Circuit Realization ......................................................................................................................... 90 
5.3 DISCRETE-TIME INTEGRATOR ......................................................................................................................... 97 
5.3.1 Analysis of the Parasitic-Insensitive Integrator ............................................................................. 98 
5.3.2 Speed and Precision Considerations ........................................................................................... 101 
5.3.3 Circuit Realization ....................................................................................................................... 107 
5.4 REAL-TIME RECURSIVE DFT PROCESSOR ........................................................................................................ 110 
5.5 ACCURACY OF THE RESULTS ......................................................................................................................... 113 
5.6 SUMMARY ............................................................................................................................................... 115 
  
 
 
CHAPTER 6 DEVICE MISMATCH ANALYSIS AND RESULTS ......................................................................... 116 
6.1 MOS TRANSISTOR MATCHING MODELS ........................................................................................................ 116 
6.2 MOS TRANSISTOR OPTIMUM MATCHING ...................................................................................................... 118 
6.3 IMPACT OF MISMATCH ON THE PERFORMANCE TRADEOFFS ............................................................................... 120 
6.4 IMPACT OF TECHNOLOGY SCALING ON THE MISMATCH ..................................................................................... 122 
6.5 MISMATCH ANALYSIS RESULTS .................................................................................................................... 123 
6.6 ROOT CAUSE ANALYSIS............................................................................................................................... 128 
6.7 MITIGATION OF THE EFFECT OF DEVICE MISMATCH ......................................................................................... 131 
6.8 SUMMARY ............................................................................................................................................... 133 
CHAPTER 7 CONCLUSION AND FUTURE WORK ........................................................................................ 134 
7.1 CONTRIBUTIONS TO KNOWLEDGE ................................................................................................................. 134 
7.1.1 Methodology ............................................................................................................................... 134 
7.1.2 Limitations and Considerations ................................................................................................... 135 
7.2 FUTURE WORK ......................................................................................................................................... 137 
7.2.1 Design Enhancements ................................................................................................................. 137 
7.2.2 Further Analysis .......................................................................................................................... 138 
LIST OF REFERENCES ............................................................................................................................................ 139 
APPENDIX A ...................................................................................................................................................... 155 
APPENDIX B ...................................................................................................................................................... 161 
 
  
 
 
TABLE OF TABLES 
TABLE 1-1: EVOLUTION OF COMMUNICATION SYSTEMS ...................................................................................................... 5 
TABLE 2-1: IEEE 802.11A/G PHY SPECIFICATIONS ......................................................................................................... 20 
TABLE 2-2: IEEE 802.16E PHY SPECIFICATIONS ............................................................................................................. 21 
TABLE 3-1: COMPUTATIONAL EFFICIENCY AND RESOURCE COSTS OF DIFFERENT DFT ARCHITECTURES ......................................... 35 
TABLE 4-1: RECEIVER PERFORMANCE REQUIREMENTS FOR BER = 10-6 ................................................................................ 41 
TABLE 4-2: SUMMARY OF THE OPTIMAL VALUE FOR THE BEHAVIOURAL MODEL PARAMETERS ................................................... 70 
TABLE 4-3: SUMMARY OF THE MONTE CARLO ANALYSIS FOR THE RECURSIVE DFT AND THE RADIX-2 FFT PROCESSORS ................. 71 
TABLE 5-1: INITIAL ASPECT RATIOS OF THE COMPLEX MULTIPLIER ........................................................................................ 96 
TABLE 5-2: FINAL ASPECT RATIOS OF THE COMPLEX MULTIPLIER .......................................................................................... 96 
TABLE 5-3: INITIAL ASPECT RATIOS OF THE OP-AMP ........................................................................................................ 109 
TABLE 5-4: FINAL ASPECT RATIOS OF THE PARASITIC-INSENSITIVE INTEGRATOR ..................................................................... 109 
TABLE 6-1: SUMMARY OF THE MONTE CARLO ANALYSIS FOR THE RECURSIVE DFT PROCESSORS OF LENGTH 8 ............................ 125 
TABLE 6-2: SUMMARY OF THE YIELD PREDICTION FOR THE RECURSIVE DFT PROCESSORS OF LENGTH 8 AND 16 .......................... 127 
TABLE 6-3: PERFORMANCE COMPARISON OF THE ANALOGUE FOURIER TRANSFORM PROCESSORS ........................................... 127 
  
 
 
TABLE OF FIGURES 
FIGURE 1-1: MARTIN COOPER HOLDS THE DYNATAC 8000X PHONE AND HIS CURRENT MOBILE PHONE DURING THE PRINCE OF 
ASTURIAS AWARDS CEREMONY IN 2009 [32]. ........................................................................................................ 4 
FIGURE 1-2: ANALOGUE AND DIGITAL SIGNAL PROCESSING SECTIONS IN (A) THE CLASSICAL OFDM RECEIVER (B) THE SOFTWARE 
DEFINED RADIO RECEIVER (C) THE OFDM RECEIVER WITH AN ANALOGUE FFT ........................................................... 7 
FIGURE 2-1: THE SPECTRUM OF THE FDM SIGNAL CONSISTING OF NONOVERLAPPING SUBCHANNELS [43] .................................. 12 
FIGURE 2-2: THE SPECTRUM OF AN OFDM SIGNAL CONSISTING OF THREE OVERLAPPING SUBCARRIERS [42] ............................... 14 
FIGURE 2-3: SUMMATION OF THE OFDM SUBCARRIERS IN THE TIME DOMAIN [1] ................................................................. 14 
FIGURE 2-4: EFFECT OF THE ISI ON THE OFDM SYMBOL IN (A) THE ABSENCE OF THE GUARD INTERVAL (B) THE PRESENCE OF THE 
GUARD INTERVAL [1] ........................................................................................................................................ 15 
FIGURE 2-5: THE OFDM SYMBOL IN THE FREQUENCY DOMAIN [45] ................................................................................... 16 
FIGURE 2-6: ALLOCATION OF SUBCARRIERS TO USERS IN THE OFDM AND OFDMA TECHNOLOGIES [1] ..................................... 16 
FIGURE 2-7: WINDOWED OFDM SYMBOL IN THE TIME DOMAIN [1] ................................................................................... 17 
FIGURE 2-8: SPECTRUM OF THE OFDM SIGNAL BEFORE AND AFTER WINDOWING [45] ........................................................... 17 
FIGURE 2-9: SYMBOL MAPPING BASED ON THE QPSK MODULATION [1] .............................................................................. 18 
FIGURE 2-10: BLOCK DIAGRAMS OF THE CLASSICAL OFDM TRANSMITTER AND RECEIVER [1] ................................................... 19 
FIGURE 2-11: DIRECT FORM REALIZATION OF AN FIR SYSTEM [44] ...................................................................................... 24 
FIGURE 2-12: SIGNAL FLOW GRAPH OF A RADIX-2 DIT FFT OF LENGTH 8 [44]...................................................................... 26 
FIGURE 2-13: SIGNAL FLOW GRAPH OF THE 2-POINT DFT [44] .......................................................................................... 27 
FIGURE 3-1: SIGNAL FLOW GRAPH OF THE GOERTZEL DFT [44] .......................................................................................... 30 
FIGURE 3-2: SIGNAL FLOW GRAPH OF THE GOERTZEL DFT WITH REAL MULTIPLIERS ................................................................ 31 
FIGURE 3-3: BLOCK DIAGRAM OF A RECURSIVE DIFFERENCE EQUATION REPRESENTING THE DISCRETE-TIME INTEGRATOR ................ 34 
FIGURE 3-4: ARCHITECTURE OF THE PROPOSED REAL-TIME RECURSIVE DFT ........................................................................... 34 
  
 
 
FIGURE 3-5: BASEBAND SIGNAL PROCESSING SECTION IN (A) THE CLASSICAL OFDM RECEIVER (B) THE OFDM 
RECEIVER WITH AN ANALOGUE FFT OR FIR DFT OR GOERTZEL DFT (C) THE OFDM RECEIVER WITH THE 
PROPOSED DFT ............................................................................................................................................ 37 
FIGURE 4-1: TYPICAL SNDR VERSUS INPUT MAGNITUDE CURVE [41] ...................................................................... 40 
FIGURE 4-2: PAPR CCDFS OF TWO OFDM SIGNALS WITH WIFI AND WIMAX STANDARDS [1] .............................................. 42 
FIGURE 4-3: THE BLOCK DIAGRAM OF THE BASEBAND SIGNAL PROCESSING PART OF (A) THE CLASSICAL OFDM RECEIVER (B) THE 
PROPOSED OFDM RECEIVER .............................................................................................................................. 43 
FIGURE 4-4: ANALOGUE DFT DYNAMIC RANGE DERIVATION .............................................................................................. 43 
FIGURE 4-5: BLOCK DIAGRAM OF THE ANALOGUE MULTIPLIER ............................................................................................ 44 
FIGURE 4-6: CURVES OF THE MULTIPLIER BEHAVIOURAL MODEL.......................................................................................... 45 
FIGURE 4-7: SWITCHED-CAPACITOR INTEGRATOR [78] ..................................................................................................... 47 
FIGURE 4-8: BEHAVIOURAL MODEL OF THE SWITCHED-CAPACITOR INTEGRATOR IN SIMULINK ................................................... 47 
FIGURE 4-9: SIGNAL FLOW GRAPH OF A RADIX-2 DIT FFT OF LENGTH 8 [41] ....................................................................... 48 
FIGURE 4-10: 2-POINT DFT WITH 𝑊81 OR 𝑊83 TWIDDLE FACTOR .................................................................................. 50 
FIGURE 4-11: 2-POINT DFT WITH 𝑊80 TWIDDLE FACTOR ............................................................................................... 52 
FIGURE 4-12: 2-POINT DFT WITH 𝑊82 TWIDDLE FACTOR ............................................................................................... 53 
FIGURE 4-13: BEHAVIOURAL MODEL OF THE ANALOGUE FFT PROCESSOR IN SIMULINK ............................................................ 54 
FIGURE 4-14: 1-POINT DFT WITH PIECEWISE CONTINUOUS COEFFICIENTS ............................................................................ 55 
FIGURE 4-15: BEHAVIOURAL MODEL OF THE REAL-TIME RECURSIVE DFT PROCESSOR IN SIMULINK ............................................ 56 
FIGURE 4-16: THE INPUT-OUTPUT CHARACTERISTICS OF IDEAL MULTIPLIERS .......................................................................... 59 
FIGURE 4-17: SNDR CURVES FOR DIFFERENT VALUES OF 𝐺𝑚𝑜 .......................................................................................... 59 
FIGURE 4-18: SNDR CURVES FOR DIFFERENT LINEAR RANGES ............................................................................................ 60 
FIGURE 4-19: SNDR CURVES FOR VARIOUS TRANSCONDUCTANCE ERRORS ........................................................................... 61 
FIGURE 4-20: SNDR CURVES FOR VARIOUS DC OFFSET MISMATCHES .................................................................................. 62 
FIGURE 4-21: SNDR CURVES FOR VARIOUS OP-AMP GAINS ............................................................................................... 65 
FIGURE 4-22: SNDR CURVES FOR VARIOUS DC OFFSET MISMATCHES .................................................................................. 66 
  
 
 
FIGURE 4-23: YIELD PREDICTION BASED ON THE MONTE CARLO ANALYSIS [82] ..................................................................... 67 
FIGURE 4-24: MONTE CARLO ANALYSIS RESULTS OF THE REAL-TIME RECURSIVE DFT PROCESSOR .............................................. 71 
FIGURE 4-25: MONTE CARLO ANALYSIS RESULTS OF THE RADIX-2 FFT PROCESSOR ................................................................ 72 
FIGURE 4-26: THE DYNAMIC RANGE HISTOGRAM OF THE REAL-TIME RECURSIVE DFT PROCESSOR.............................................. 72 
FIGURE 4-27: THE DYNAMIC RANGE HISTOGRAM OF THE RADIX-2 FFT PROCESSOR ................................................................ 73 
FIGURE 5-1: (A) SWITCHED-CAPACITOR AMPLIFIER (B) TIMING DIAGRAM OF CIRCUIT (A) ......................................................... 75 
FIGURE 5-2: THE BASIC CURRENT MIRROR [80] ............................................................................................................... 76 
FIGURE 5-3: THE PASSIVE SWITCHED-CAPACITOR MULTIPLIER ............................................................................................ 76 
FIGURE 5-4: THE SWITCHED-TRANSCONDUCTOR MULTIPLIER ............................................................................................. 77 
FIGURE 5-5: THE FLOATING-GATE MULTIPLIER ................................................................................................................. 78 
FIGURE 5-6: TWO-QUADRANT ANALOGUE MULTIPLIER [96] .............................................................................................. 81 
FIGURE 5-7: BLOCK DIAGRAM OF THE GILBERT CELL ......................................................................................................... 82 
FIGURE 5-8: GM TRANSCONDUCTOR [80] ....................................................................................................................... 83 
FIGURE 5-9: GILBERT CELL .......................................................................................................................................... 86 
FIGURE 5-10: INPUT-OUTPUT CHARACTERISTIC OF A DIFFERENTIAL PAIR [80] ........................................................................ 89 
FIGURE 5-11: DEGENERATED GILBERT CELL WITH DIODE-CONNECTED LOAD .......................................................................... 91 
FIGURE 5-12: DEGENERATED GILBERT CELL WITH CMFB NETWORK .................................................................................... 93 
FIGURE 5-13: TOPOLOGY OF THE COMPLEX MULTIPLIER .................................................................................................... 94 
FIGURE 5-14: TRANSFER CHARACTERISTIC OF THE GILBERT CELL MULTIPLIER SIMULATED IN SPICE ............................................ 97 
FIGURE 5-15: (A) CONTINUOUS-TIME INTEGRATOR (B) DISCRETE-TIME INTEGRATOR (C) TIMING DIAGRAM OF CIRCUIT (B) [80] ...... 98 
FIGURE 5-16: (A) PARASITIC-INSENSITIVE INTEGRATOR (B) CIRCUIT OF (A) IN SAMPLING MODE, (C) CIRCUIT OF (A) IN INTEGRATION 
MODE [80] ..................................................................................................................................................... 99 
FIGURE 5-17: TIMING DIAGRAM OF THE PARASITIC-INSENSITIVE INTEGRATOR ...................................................................... 100 
FIGURE 5-18: EQUIVALENT CIRCUIT OF THE PARASITIC-INSENSITIVE INTEGRATOR IN INTEGRATION MODE .................................. 103 
FIGURE 5-19: DIFFERENTIAL AMPLIFIER WITH SINGLE-ENDED OUTPUT [80] ......................................................................... 105 
FIGURE 5-20: SLEWING IN THE OP-AMP [80] ............................................................................................................... 105 
  
 
 
FIGURE 5-21: PARASITIC-INSENSITIVE INTEGRATOR WITH RESET SWITCHES ......................................................................... 108 
FIGURE 5-22: OUTPUT OF A DIFFERENTIAL PARASITIC-INSENSITIVE INTEGRATOR SIMULATED IN SPICE ...................................... 110 
FIGURE 5-23: THE SNDR CURVES OF REAL-TIME RECURSIVE DFT PROCESSORS WITH IDEAL DEVICES ........................................ 111 
FIGURE 5-24: SNDR CURVES OF REAL-TIME RECURSIVE DFT PROCESSORS IN THE PRESENCE OF DEVICE MISMATCH .................... 112 
FIGURE 5-25: SNDR CURVES OF REAL-TIME RECURSIVE DFT PROCESSORS WITH DIFFERENT TRANSFORM LENGTHS ..................... 113 
FIGURE 5-26: STEPS IN THE INTEGRATED CIRCUIT DESIGN FLOW [115] ............................................................................... 114 
FIGURE 6-1: EQUAL DRAWN AREA DEVICES (A) SHORT CHANNEL (B) NARROW CHANNEL ........................................................ 119 
FIGURE 6-2: MODELING VTH VARIATIONS USING A DC VOLTAGE SOURCE IN SERIES WITH THE MOS GATE 
TERMINAL .................................................................................................................................................. 123 
FIGURE 6-3: MISMATCH ANALYSIS RESULTS OF THE REAL-TIME RECURSIVE DFT PROCESSOR OF LENGTH 8 ................................. 124 
FIGURE 6-4: DYNAMIC RANGE HISTOGRAM OF THE 8-POINT DFT PROCESSOR FOR BPSK MODULATED SIGNAL ........................... 125 
FIGURE 6-5: DYNAMIC RANGE HISTOGRAM OF THE 16-POINT DFT PROCESSOR FOR BPSK MODULATED SIGNAL ......................... 126 
FIGURE 6-6: DYNAMIC RANGE HISTOGRAM OF THE 8-POINT DFT PROCESSOR FOR QPSK MODULATED SIGNAL .......................... 126 
FIGURE 6-7: THE SNDR CURVES OF 8-POINT RECURSIVE DFT PROCESSORS WITH IDEAL DEVICES ............................................. 129 
FIGURE 6-8: THE SNDR CURVES OF 8-POINT RECURSIVE DFT PROCESSORS ........................................................................ 130 
FIGURE 6-9: OFFSET CANCELLATION BY AN AUXILIARY TRANSCONDUCTANCE IN A NEGATIVE FEEDBACK LOOP [80] ...................... 131 
FIGURE 6-10: PERFORMANCE COMPARISON OF A 4-PONIT ANALOGUE DFT IMPLEMENTED ON A FPAA [93] ............................ 133 
  
 
 
LIST OF ABBREVIATIONS 
1G First Generation 
2G Second Generation 
3G Third Generation 
4G Fourth Generation 
ACI Adjacent Channel Interference   
ADC Analogue to Digital Converter 
AGC Automatic Gain Control 
AWGN Additive White Gaussian Noise 
BER Bit Error Ratio 
BJT Bipolar Junction Transistor 
BPSK Binary Phase-Shift Keying 
BSIM Berkeley Short-Channel IGFET Model 
CCDF Complementary Cumulative Distribution Function 
CLT Central Limit Theorem 
CM Common Mode 
CMFB Common Mode Feedback 
CMOS Complementary Metal-Oxide Semiconductor 
CP Cyclic Prefix 
  
 
 
DAC Digital to Analogue Converter 
dB decibel 
DC Direct Current 
DFT Discrete Fourier Transform 
DIF Decimation In Frequency 
DIT Decimation In Time 
DSP Digital Signal Processor 
EVM Error Vector Magnitude 
FDM Frequency Division Multiplexing 
FEC Forward Error Correction 
FFT Fast Fourier Transform 
FIR Finite Impulse Response 
FPAA Field Programmable Analogue Array 
ICI Intercarrier Interference 
IDFT Inverse Discrete Fourier Transform 
IEEE Institute of Electrical and Electronics Engineers 
IF Intermediate Frequency 
IFFT Inverse Fast Fourier Transform 
iid independent and identically distributed 
IP2 Second Intercept Point 
  
 
 
ISI Intersymbol Interference  
KCL Kirchhoff’s Current Law 
KVL Kirchhoff’s Voltage Law 
LDPC Low-Density Parity-Check  
LLN Low of Large Numbers 
LMS Least Mean Square 
LPF Low Pass Filter 
LSB Least Significant Bit 
LTE Long Term Evolution 
MMSE Minimum Mean Square Error 
NMOS N-channel Metal-Oxide Semiconductor 
OFDM Orthogonal Frequency-Division Multiplexing 
OFDMA Orthogonal Frequency Division Multiple Access 
Op-amp Operational amplifier  
PAPR Peak to Average Power Ratio 
PCM Pulse Code Modulation 
PDF Probability Density Function 
PDK Process Design Kit 
PHY Physical layer 
PMOS P-channel Metal-Oxide Semiconductor 
  
 
 
QAM Quadrature Amplitude Modulation  
QPSK Quadrature Phase-Shift Keying 
RF Radio Frequency 
RMS Root Mean Square   
SC Switched Capacitor 
SDR Software Defined Radio 
SNDR Signal to Noise and Distortion Ratio 
SNR Signal to Noise Ratio 
SOFDMA Scalable Orthogonal Frequency Division Multiple Access 
SPICE Simulation Program with Integrated Circuit Emphasis 
SR Slew Rate 
TPC Turbo Product Code 
UWB Ultra-Wideband 
VLSI Very Large Scale Integration 
VoIP Voice over Internet Protocol 
WiFi Wireless Fidelity 
WiMAX Worldwide Interoperability for Microwave Access 
WLAN Wireless Local Area Network 
WMAN Wireless Metropolitan Area Network 
 
 
 1 
 
Chapter 1  
INTRODUCTION 
In this chapter, a historical perspective on the development of communication systems 
is provided. The gaps in the previous research are discussed in the statement of the 
problem. Objectives and significance of the study are explained to clarify how this work 
will contribute to knowledge. Finally, an outline of the thesis structure is provided. 
1.1 Evolution of Communication Systems 
The proposal to use electricity in communication is dated back to the late 18th century. 
In 1795, Francisco Salvá Campillo proposed an electrical telegraph as an alternative to 
optical ones [2]. In 1809, an electrochemical telegraph was designed by Samuel Thomas 
von Sömmerring [3]. The first electrical telegraph was built by Francis Ronalds in 1816 
[4]. The costs of using one wire for each letter of the alphabet in early designs of 
telegraph were prohibitive. Therefore, in 1835, Pavel L'vovitch Shilling reduced the 
number of wires by developing the first binary code for the telegraph [5]. In 1838, 
Samuel Morse and Alfred Vail invented a single-wire telegraph and the Morse code. 
The Morse/Vail telegraph became the forerunner of digital communication [6].  
 
 2 
 
The idea of speaking telegraph was initially proposed by Innocenzo Manzetti in 1844 
[7]. Later, in 1854, Charles Bourseul wrote a memorandum on the electrical 
transmission of speech [8]. In 1871, Antonio Meucci filed a patent caveat for his 
telephone invention (telettrofono). Meucci filed a patent caveat because he could not 
afford the $250 fee necessary to file a patent application [9]. In 1876, Elisha Gray filed 
a patent caveat for a telephone on the very same day that Alexander Graham Bell filed a 
patent application for a telephone. After a month, Bell’s telephone patent issued and 
telephone became the forerunner of analogue communication [10-12].  
Digital communication remained attractive owing to the contributions of Guglielmo 
Marconi and Karl Ferdinand Braun to the invention of wireless telegraphy in 1896 [13, 
14]. In 1906, Reginald Fessenden invented the heterodyne transceiver which used the 
Amplitude Modulation (AM) to transmit an audio signal via a radio carrier wave [15]. 
In 1912, the significant role of Marconi's wireless telegraphy in rescuing the survivors 
of Titanic proved its vital importance for marine communication [16]. In 1918, Edwin H. 
Armstrong invented the superheterodyne receiver which converted the frequency of 
received signal to a fixed Intermediate Frequency (IF). Comparing to the heterodyne 
receiver, the superheterodyne receiver provided better selectivity and sensitivity. Later, 
in 1933, Armstrong demonstrated the Frequency Modulation (FM) which provided 
better sound quality and fidelity than AM [17]. 
In 1937, Alec Harley Reeves invented the Pulse Code Modulation (PCM) to enhance 
the noise immunity of audio transmission over long distances [18]. In fact, Reeves 
invented the first all-electronic Analogue to Digital Converters (ADC) and Digital to 
Analogue Converter (DAC) [19]. Another landmark of 1937 was Claude Shannon’s 
master’s thesis. Shannon proved that Boolean algebra could optimise the design of 
electromechanical relays in telephone routing switches. Shannon’s work on the 
electrical implementation of Boolean functions became the foundation of digital circuit 
design [20]. In 1948, Shannon laid the theoretical foundations of digital 
communications in his paper “A Mathematical Theory of Communication” [21].  
 3 
 
In 1947, Walter H. Brattain, John Bardeen, and William Shockley invented the 
transistor at Bell Laboratories [22]. In 1958, Jack Kilby realized the first germanium 
Integrated Circuit (IC) [23]. Few months later, Robert Noyce produced the first silicon 
IC [24]. These landmark innovations changed the nature of the communication systems 
in the second half of the 20th century [6].   
In 1965, James Cooley and John Tukey developed the Fast Fourier transform (FFT) 
algorithm for efficient computation of the Discrete Fourier Transform (DFT) [25]. In 
1966, Robert W. Chang invented the Orthogonal Frequency Division Multiplexing 
(OFDM) for simultaneous transmission of data on multiple channels [26, 27]. In 1971, 
Weinstein and Ebert suggested to use the FFT for realization of OFDM modulator and 
demodulator [28].  
In 1973, the first handheld mobile cell phone was invented by Martin Cooper (Figure 
1.1) and his fellow teammates at Motorola [29]. 10 years after Cooper’s invention, the 
first-generation of mobile communication (1G) systems was launched. 1G was based on 
analogue communication [30]. The first commercially available mobile phone 
(DynaTAC 8000x) resembled a brick in terms of size and weight. Besides, its battery 
only lasted 30 minutes after 10 hours of recharge [31]. 
The second-generation of mobile communication (2G) systems emerged in 1991. 2G 
was based on digital communication. While 1G systems had no security, 2G systems 
provided security by encrypting the digital signals. Moreover, 2G digital systems made 
error detection and error correction possible by encoding and decoding. Since error 
correction minimises the effect of interference, 2G systems achieved better 
communication quality than 1G systems. Furthermore, comparing to 1G systems, 2G 
systems provided higher spectrum efficiency by compressing the digital data. 
Additionally, 2G systems applied multiple access techniques which allow multiple users 
to share the frequency band. Thereby, 2G systems achieved higher capacity than 1G 
systems. Comparing to 1G analogue systems, 2G digital systems had longer battery life 
and cheaper equipment. These advantages led to the prevalence of the digital 
communication standards [30].   
 4 
 
Figure 1-1: Martin Cooper holds the DynaTAC 8000x phone and his current mobile phone during the 
Prince of Asturias Awards ceremony in 2009 [32]. 
 
The proliferation of mobile phone users led to the growing demand for mobile internet 
access. In response to this demand, the third-generation of mobile communication (3G) 
systems emerged in 2001. 3G systems use packet switching for data transmission and 
circuit switching for voice calls [17, 30].  
3G systems can not satisfy the growing demand for streaming media. Hence, the fourth-
generation of mobile communication (4G) systems emerged in 2011. 4G systems 
provide higher data rate than the existing 3G systems. Moreover, 4G networks use 
packet switching with Internet Protocol (IP) for data and voice transmission. The circuit 
switching in 3G networks is replaced by the Voice over Internet Protocol (VoIP) in 4G 
networks. Worldwide Interoperability for Microwave Access (WiMAX) and Long Term 
Evolution (LTE) are the two competing technologies for 4G systems [17, 30]. 
 
 5 
 
Table 1-1 shows the landmark innovations in the history of analogue and digital 
communication systems. The earliest form of electronic communication system 
(telegraph) was digital. However, digital signals could not convey the continuous waves 
of speech. Conversion from digital to analogue made the speech communication 
possible. For more than a century (1876-1991), analogue communication systems had 
been used to transmit audio signals. Laying the theoretical and practical foundations of 
modern digital communications took more than 50 years (1937-1991). Comparing to 
analogue systems, modern digital communication systems provide higher security, 
better communication quality, higher spectrum efficiency, and higher capacity.  
 
Table 1-1: Evolution of Communication Systems 
Year Innovation  
1838 Telegraph 
1876 Telephone 
1896 Wireless telegraphy  
1906 Heterodyne transceiver, AM broadcasting 
1918 Superheterodyne receiver 
1933 FM broadcasting 
1937 PCM 
1937 Electrical implementation of Boolean functions 
1947 Transistor 
1948 Mathematical Theory of Communication 
1958 Integrated Circuit 
1965 FFT algorithm 
1966 OFDM 
1983 1G 
1991 2G 
2001 3G 
2011 4G 
Digital     Analogue     Foundation of modern Digital systems 
Foundation of modern Analogue and Digital systems 
 6 
 
1.2 Statement of the Problem 
The previous section revealed that a new generation of mobile communication standard 
has emerged approximately every 10 years. Upgrading the technology of mobile 
networks in all areas takes few years. Hence, mobile phones should support the previous 
communication standards as well as the latest standards. Moreover, since WiFi [33] 
provides higher data rate than WiMAX [34], in areas where both WiFi and WiMAX are 
available (e.g. university campus, office, home, hotel) it is preferable to use WiFi [17].   
The initial approach to realise a multi-standard mobile phone was to use multiple 
transceivers (Figure 1.2(a)) in parallel. However, as the number of communication 
standards increases, size and cost of the mobile handset increases [35, 36]. To resolve 
this issue, Joseph Mitola proposed the concept of Software Defined Radio (SDR) 
according to which a single transceiver can support multiple communication standards 
if it is reconfigurable by software [37]. Mitola suggested that the SDR can be achieved 
by replacing the analogue signal processing stages of the transceiver (i.e. analogue 
front-end) with a Digital Signal Processor (DSP) (Figure 1.2(b)) [37]. Moving the ADC 
and the DSP closer to the antenna means that the signal should be sampled and 
processed at the Radio Frequency (RF). Frequency bands that are allocated to the 
mobile communication standards and WiFi are between 800 MHz to 5.5 GHz. To 
digitize any signal from 800 MHz to 5.5 GHz, a 12 bit, 11 GS/s ADC is required. Such 
a demanding ADC is unrealizable with the current technology [38]. Also, since the 
progress of ADC dynamic range and conversion speed are slower than the Moore’s law, 
the required ADC will remain infeasible in the foreseeable future [39]. Even if a 12 bit, 
11 GS/s ADC were feasible, its power dissipation would be hundreds of watts [38]. 
Moreover, in the SDR receiver, the digital front-end performs the downconversion. The 
digital mixer requires four real multiplications per complex signal sample. Considering 
the sample rate of 11 GS/s, the DSP must perform 44 billion multiplications per second. 
Considering the power dissipation of the digital mixer, implementation of the 
downconversion on the DSP is not sensible [40]. Hence, the SDR that was envisaged by 
Mitola has remained elusive [38]. 
 7 
 
 
Figure 1-2: Analogue and Digital signal processing sections in (a) the classical OFDM receiver (b) the 
Software Defined Radio receiver (c) the OFDM receiver with an analogue FFT  
 
Demanding ADC is also a serious impediment to the Ultra-Wideband (UWB) OFDM 
wireless transceivers [41]. In an effort to relax the ADC requirements in the UWB 
OFDM transceivers, an analogue FFT processor was proposed (Figure 1.2(c)) [41]. 
Transferring the FFT processor from the digital back-end to the analogue front-end 
reduces the bit depth requirement of the ADC. Thereby, the power consumption of the 
ADC reduces. Moreover, the analogue FFT processor consumes significantly less power 
than the digital FFT [41]. However, the analogue FFT processor is not reconfigurable 
because it is hardwired. Hence, the analogue FFT processor is not suitable for multi-
standard transceivers.  
 
 8 
 
1.3 Dissertation Objectives 
After reviewing the requirements of the modern mobile handheld devices and 
impediments to realization of the SDR, it is clear that an alternative architecture for 
multi-standard transceivers must be explored. For OFDM-based transceivers, the 
analogue FFT processor is an attractive alternative to the power hungry digital FFT 
processor. Multiple OFDM-based transceivers can be integrated by a variable-length 
DFT processor. To consume the power efficiently, the power consumption of the 
variable-length DFT processor should be scalable with the length of the transform. In 
this thesis, a power-scalable variable-length analogue DFT processor that meets the 
specifications of WiFi and WiMAX standards is proposed. 
The previous works on the analogue DFT processor merely focused on the circuit 
design methods and used the conventional architectures that were originally designed 
for the digital DFT processor or the discrete-time filters. Hence, a novel architecture 
that is designed based on the characteristics of the analogue signal processing domain is 
required. The main concern is the arithmetic precision of the analogue DFT processor. 
Therefore, performance of the proposed system should be analysed at various stages of 
the design process by statistical modelling of the mismatch. 
1.4 Significance of the Research 
Digital signal processing or analogue signal processing; that has been the question 
throughout the history of communication systems. Finding the answer to this question 
led to the invention of the telephone, the advent of 2G systems, and changing Mitola’s 
paradigm of SDR [38]. A power-scalable variable-length analogue DFT processor can 
be another breakthrough in transceivers. Sharing the DFT processor between multiple 
transceivers and implementing it with analogue circuits can significantly reduce the 
hardware cost.  
 9 
 
Moreover, a power-scalable analogue DFT processor can be the most power-efficient 
DFT processor. Hence, this research may lead to a new generation of mobile phones 
that are smaller, cheaper, and have longer battery life.  
1.5 Thesis Outline 
In this chapter, the evolution of communication systems was overviewed. Also, 
limitations of previous research on the SDR and the analogue FFT processor were 
discussed. Additionally, objectives and significance of the research were explained.  
Chapter 2 provides the background knowledge on the OFDM technology and the 
OFDM-based standards. State-of-the-art FFT processors are reviewed. Also, a 
comparison between the analogue and digital signal processing is provided to elaborate 
the trade-offs in each approach. 
In Chapter 3, the proposed architecture for the power-scalable variable-length analogue 
DFT processor is explained. Advantages and novelty of the proposed architecture are 
revealed by making comparisons between the proposed architecture and previous 
Fourier transform architectures. 
In Chapter 4, performance requirements of the analogue DFT processor are derived. The 
behavioural models of the processor building blocks are explained. System simulations 
based on the behavioural models are performed to determine the design specifications of 
circuits. Yield prediction based on the Monte Carlo method is discussed. Moreover, 
performance of the proposed architecture and the FFT architecture are compared 
together. 
Chapter 5 reviews various design approaches for the building blocks of the analogue 
DFT processor. Circuits that can provide the required flexibility for the power-scalable 
variable-length DFT processor are selected. Selected circuits are designed in 180 nm 
CMOS technology. Speed-power-accuracy trade-offs in circuits with ideal devices are 
discussed.   
 10 
 
Chapter 6 reviews the mismatch models available in the open literature. This chapter 
also explains the design trade-offs that impose limitations on the performance of 
analogue signal processors. The effect of technology scaling on mismatch is also 
discussed.  The impact of device mismatch on the performance of the circuit is analysed. 
Results of this analysis are compared with previous work. Finally, techniques that can 
mitigate the effect of device mismatch are mentioned.  
Chapter 7 presents the concluding remarks and the original contributions of this study. 
This chapter also provides recommendations for future research. 
 
 
 
 
 
 11 
 
Chapter 2   
BACKGROUND STUDY AND 
LITERATURE REVIEW 
In this chapter, the OFDM technology and the OFDM-based standards are overviewed. 
Also, achievements of the latest studies on the FFT processors are mentioned. A 
comparison between the analogue and digital circuits is provided. Furthermore, the 
existing architectures for the analogue Fourier transform processor are explained.  
2.1 Fundamentals of OFDM  
Orthogonal Frequency-Division Multiplexing (OFDM) and its variants are the 
predominant technology in the fourth-generation of mobile communication systems 
(4G). OFDM is an advanced form of the Frequency Division Multiplexing (FDM). 
FDM is a technique that facilitates the simultaneous transmission of multiple signals on 
a single medium by dividing the channel bandwidth into multiple subchannels (Figure 
2.1). FDM is an effective technique to combat Intersymbol Interference (ISI) and 
multipath fading in wireless communications. However, since FDM prevents 
interference between subchannels by means of guard bands, it does not use the channel 
bandwidth efficiently [42].  
 12 
 
 
Figure 2-1: the spectrum of the FDM signal consisting of nonoverlapping subchannels [43] 
In OFDM a broad frequency spectrum is divided into multiple orthogonal narrowband 
subchannels by the Discrete Fourier Transform (DFT). OFDM modulation and 
demodulation are performed by the Inverse Discrete Fourier Transform (IDFT) and 
DFT, respectively. Both DFT and IDFT multiply discrete samples of signal by complex 
exponentials [1, 44].  
 
𝐼𝐷𝐹𝑇:      𝑥(𝑛) =
1
𝑁
∑𝑋(𝑘)
𝑁−1
𝑘=0
𝑒   𝑗2𝜋 
𝑘𝑛
𝑁  ,        𝑛 = 0,1, … ,𝑁 − 1 
 
(2.1) 
𝐷𝐹𝑇:          𝑋(𝑘) = ∑ 𝑥(𝑛)
𝑁−1
𝑛=0
𝑒−𝑗2𝜋 
𝑘𝑛
𝑁  ,        𝑘 = 0,1, … , 𝑁 − 1 
 
(2.2) 
In the above equations, 𝑥(𝑛) and 𝑋(𝑘) represent discrete samples of the modulated and 
demodulated signals, respectively. Hence, elements of the sequence {𝑒  𝑗2𝜋(𝑘𝑛/𝑁)}
𝑘=0
𝑁−1
 
are the subcarriers of the 𝑥(𝑛). The orthogonality of subcarriers to each other is proven 
by multiplying both sides of the equation (2.1) by 𝑒−𝑗2𝜋(𝑚𝑛/𝑁) and summing from 𝑛 =
0 to 𝑛 = 𝑁 − 1 [44]. 
 
∑𝑥(𝑛)
𝑁−1
𝑛=0
𝑒−𝑗2𝜋 
𝑚𝑛
𝑁 = ∑  
1
𝑁
 ∑ 𝑋(𝑘)
𝑁−1
𝑘=0
𝑒  𝑗2𝜋 
(𝑘−𝑚)𝑛
𝑁
𝑁−1
𝑛=0
 
 
(2.3) 
 
 13 
 
Interchanging the order of summation on the right hand side of the equation (2.3) gives 
∑𝑥(𝑛)
𝑁−1
𝑛=0
𝑒−𝑗2𝜋 
𝑚𝑛
𝑁 = ∑𝑋(𝑘) [ 
1
𝑁
 ∑ 𝑒  𝑗2𝜋 
(𝑘−𝑚)𝑛
𝑁
𝑁−1
𝑛=0
 ]
𝑁−1
𝑘=0
 
 
(2.4) 
The term inside the bracket is [44] 
1
𝑁
 ∑ 𝑒  𝑗2𝜋 
(𝑘−𝑚)𝑛
𝑁
𝑁−1
𝑛=0
= {
1               𝑘 = 𝑚
 
 0       𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
 
 
(2.5) 
Hence, subcarriers are orthogonal to each other. Combining equations (2.4) and (2.5) 
gives 
∑𝑥(𝑛)
𝑁−1
𝑛=0
𝑒−𝑗2𝜋 
𝑚𝑛
𝑁 = 𝑋(𝑚) 
 
(2.6) 
which is the formula for a DFT. Accordingly, applying DFT on the modulated samples 
demodulates them.  
Elements of the sequence {𝑒  𝑗2𝜋(𝑘𝑛/𝑁)}
𝑛=0
𝑁−1
 are samples of the time-limited 𝑒  𝑗2𝜋𝑓𝑘𝑡 
which is the 𝑘𝑡ℎ  subcarrier ( 𝑓𝑘 = 𝑘/𝑇   and  −𝑇/2 ≤ 𝑡 ≤ 𝑇/2 ). Hence, the Fourier 
transform of the 𝑘𝑡ℎ subcarrier is [1] 
𝑌(𝑓) = ∫ 𝑒  𝑗2𝜋𝑓𝑘𝑡
𝑇/2 
−𝑇/2 
. 𝑒−𝑗2𝜋𝑓𝑡𝑑𝑡 = ∫ 𝑒−𝑗2𝜋(𝑓−𝑓𝑘)𝑡𝑑𝑡
𝑇/2 
−𝑇/2 
=
sin(𝜋(𝑓 − 𝑓𝑘))
𝜋(𝑓 − 𝑓𝑘)
 
 
(2.7) 
Thus,  𝑌(𝑓) = sinc(𝑓 − 𝑓𝑘) . Figure 2.2 shows three subcarriers of the OFDM signal. 
Since subcarriers are orthogonal, zero crossings of each subcarrier falls on the peaks of 
other subcarriers. Therefore, not only is a guard band between adjacent subcarriers 
unnecessary, but also the subcarriers can overlap. Thereby, OFDM uses the channel 
bandwidth efficiently [45]. 
 14 
 
 
Figure 2-2: the spectrum of an OFDM signal consisting of three overlapping subcarriers [42] 
Figure 2.3 depicts the imaginary part of four subcarriers in the time domain. For a large 
number of modulated subcarriers (𝑁 ≫ 1)  the OFDM symbol appears as Gaussian 
noise in the time domain [1]. 
 
Figure 2-3: summation of the OFDM subcarriers in the time domain [1] 
 15 
 
The performance of the wireless communication systems depends on the channel 
characteristics. Multipath propagation results in phase shifting and fading. Thus, 
channel estimation is necessary to extract the original signal from the received signal. In 
order to estimate the channel, deterministic subcarriers called Pilot are added to the 
OFDM symbol [1].  
The channel delay spread in multipath propagation creates Intersymbol Interference 
(ISI) between successive OFDM symbols (Figure 2.4 (a)). Also, the time-dispersive 
channel creates Intercarrier Interference (ICI) which destroys the orthogonality between 
subcarriers. In order to eliminate the effect of ISI, guard intervals are added to the 
OFDM symbol (Figure 2.4 (b)). Subcarriers that are transmitted during the guard 
interval are null. The guard interval should exceed the maximum excess delay of the 
multipath propagation channel [46]. Since a guard interval is ineffective in cancelling 
ICI, the Cyclic Prefix (CP) is used instead. CP is the copy of the last part of the OFDM 
symbol which is prefixed to the OFDM symbol. Thus, the CP preserves the 
orthogonality between subcarriers by making the OFDM symbol periodic [47]. Figure 
2.5 illustrates the OFDM symbol in the frequency domain. 
 
 
Figure 2-4: effect of the ISI on the OFDM symbol in (a) the absence of the guard interval (b) the presence 
of the guard interval [1] 
 16 
 
 
Figure 2-5: the OFDM symbol in the frequency domain [45] 
In OFDM technology, all the subcarriers of the OFDM symbol are allocated to one user. 
On the other hand, the Orthogonal Frequency Division Multiple Access (OFDMA) 
technology assigns the subcarriers of the OFDM symbol to different users (Figure 2.6). 
Thereby, the channel bandwidth is divided into subchannels and shared between several 
users. The data-rate of each user can be controlled by varying the number of subcarriers 
in the allocated subchannel [42]. 
 
Figure 2-6: allocation of subcarriers to users in the OFDM and OFDMA technologies [1] 
As mentioned earlier, the spectrum of the time-limited OFDM symbol is the sum of 
frequency shifted sinc functions. Thus, OFDM symbols produce large out-of-band 
power which leads to the Adjacent Channel Interference (ACI). Hence, a guard band is 
used to reduce the effect of ACI. Moreover, the out-of-band power is reduced by 
windowing the OFDM symbol [43]. Figure 2.7 and Figure 2.8 show the effect of 
windowing in the time domain and the frequency domain, respectively. 
 
 17 
 
 
Figure 2-7: Windowed OFDM symbol in the time domain [1]  
 
 
Figure 2-8: Spectrum of the OFDM signal before and after windowing [45] 
 
 18 
 
The frequency-selective channel may severely attenuate some of the subcarriers. 
Attenuation of the data subcarriers leads to bit errors. Hence, Forward Error Correction 
(FEC) coding and interleaving are essential in order to spread the coded bits over the 
bandwidth [45]. FEC codes that are used by most of the OFDM-based standards include 
Concatenated code, Convolutional code, Block code, Turbo code, Low-Density Parity-
Check (LDPC) code, and Reed-Solomon code [43].  
After the channel coding, the OFDM transmitter maps the bit stream on the 
constellation points. Thereby, each symbol is represented by a magnitude and a phase. 
Symbol mapping is performed based on the Quadrature Amplitude Modulation (QAM), 
the Binary Phase-Shift Keying (BPSK), or the Quadrature Phase-Shift Keying (QPSK) 
(Figure 2.9) [1]. 
 
Figure 2-9: Symbol mapping based on the QPSK modulation [1] 
 
Figure 2.10 shows the block diagrams of the classical OFDM transmitter and receiver. 
The Fast Fourier Transform (FFT) and the Inverse Fast Fourier Transform (IFFT) 
processors are used to compute DFT and IDFT efficiently [1, 43]. In Figure 2.10, DAC 
and ADC denote the Digital to Analogue Converter and the Analogue to Digital 
Converter, respectively.   
 
  
 19 
 
 
 
  
F
ig
u
re
 2
-1
0
: 
b
lo
ck
 d
ia
g
ra
m
s 
o
f 
th
e 
cl
as
si
ca
l 
O
F
D
M
 t
ra
n
sm
it
te
r 
an
d
 r
ec
ei
v
er
 [
1
] 
 20 
 
2.2 WiFi and WiMAX Physical Layer Overview 
WiFi (IEEE 802.11a/g) and WiMAX (IEEE 802.16e) are the OFDM-based standards 
that are supported by most 4G mobile handheld devices. Hence, these standards are 
considered for the purpose of this study. WiFi (Wireless Fidelity) standards are set for 
Wireless Local Area Networks (WLANs). The difference between 802.11a and 802.11g 
standards is that the former operates in the 5GHz band while the later operates in the 
2.4GHz band [48]. Table 2-1 summarizes the Physical layer (PHY) specifications of the 
802.11a and 802.11g standards [33, 49]. WiFi optimizes the data rate and maintains the 
required Bit Error Rate (BER) by adapting modulation and coding rate to the radio link 
quality [1]. Accordingly, the maximum data rate of the 802.11a/g is 54Mbits/s which is 
obtained by using 64-QAM (i.e. 6 bits on each of the data subcarriers) and coding rate 
of 3/4: ((6 × 48)/4𝜇𝑠) × 3/4 = 54𝑀/𝑠 . 
 
Table 2-1: IEEE 802.11a/g PHY specifications 
 
Channel bandwidth (MHz) 
 
20  
IFFT/FFT size 64 
IFFT/FFT clock (MHz)  20  
Subcarrier spacing (kHz) 312.5 ( 20MHz / 64) 
Number of data subcarriers 48 
Number of pilot subcarriers 4 
Number of guard band subcarriers 11 (6 on the left and 5 on the right) 
Number of DC subcarriers 1 
Total number of subcarriers 64 
Modulation BPSK, QPSK, 16-QAM, 64-QAM 
TFFT ∶ Useful symbol duration (𝜇𝑠) 3.2 
TCP ∶ Cyclic prefix duration (𝜇𝑠) 0.8 (TFFT/4) 
OFDM symbol duration (𝜇𝑠) 4 (TFFT + TCP) 
Channel coding Convolutional coding rates : 1/2, 2/3, 3/4   
 21 
 
WiMAX (Worldwide Interoperability for Microwave Access) standard is set for 
Wireless Metropolitan Area Networks (WMANs). WiMAX can operate in licensed and 
unlicensed bands between 2 to 11 GHz. The 802.16e standard uses the Scalable 
OFDMA (SOFDMA) to support different channel bandwidths. The SOFDMA keeps the 
carrier spacing constant by scaling the FFT size to the channel bandwidth [50]. The 
mobile devices that are supported by this standard can travel at tens of kilometres per 
hour while communicating. Table 2-2 summarizes the PHY specifications of the 
802.16e standard [34, 48]. According to these specifications, the maximum data rate of 
the 802.16e is 75 bits/s. 
 
Table 2-2: IEEE 802.16e PHY specifications 
 
Channel bandwidth (MHz) 1.25 5 10 20 
IFFT/FFT size 128 512 1024 2048 
IFFT/FFT clock (MHz) 1.4 5.6 11.2 22.4 
Number of subchannels 2 8 16 32 
Subcarrier spacing (kHz) 10.94 10.94 10.94 10.94 
Number of data subcarriers 72 360 720 1440 
Number of pilot subcarriers 12 60 120 240 
Number of guard band and DC subcarriers 44 92 184 368 
Total number of subcarriers 128 512 1024 2048 
Modulation BPSK, QPSK, 16-QAM, 64-QAM 
TFFT ∶ Useful symbol duration (𝜇𝑠) 91.4 91.4 91.4 91.4 
TCP ∶ Cyclic prefix duration (𝜇𝑠) TFFT/8 TFFT/8 TFFT/8 TFFT/8 
OFDM symbol duration (𝜇𝑠) 102.8 102.8 102.8 102.8 
Channel coding Convolutional, Optional Convolutional, 
Turbo, Block Turbo, LDPC  
 22 
 
2.3 State-of-the-Art FFT Processors 
The rapid proliferation of wireless communication standards has led to the emergence of 
multi-standard radios. Since classical transceiver architectures are not suitable for a one-
product solution, new architectures should be proposed to fulfil this demand. In view of 
that, digital designers developed reconfigurable FFT processors to integrate multiple 
OFDM-based transceivers [51-53]. Transform length and throughput of the 
reconfigurable FFT processor must vary for each standard. Hence, energy-efficient 
reconfigurable FFT processors, that scale the power consumption with the transform 
length and throughput, were proposed [54, 55].  
While at least 6-bit resolution is required to represent the Gaussian OFDM signal, 2 bits 
are sufficient to represent the QPSK symbols after the FFT demodulation. In an effort to 
ease the conversion burden on the ADC, FFT was applied on the discrete-time samples, 
prior to the ADC [41]. This approach reduces the bit depth requirement of the ADC, and 
consequently lowers the ADC power consumption [56]. More importantly, the analogue 
FFT processor consumes significantly less power than the digital FFT [57-59].  
2.4 Comparison of Analogue and Digital signal 
processing 
As mentioned in the previous section, latest studies show that the analogue FFT 
processor has significantly less power consumption than the digital FFT processor. This 
section provides an overview on the analogue and digital circuits to elaborate the 
reasons of computational efficiency in analogue circuits. In each case, the numbers of 
transistors that are required to implement basic operations of the Fourier transform (i.e. 
addition and multiplication) are given. Moreover, the compromise that is made by 
migrating from the digital signal processing domain to the analogue signal processing 
domain is mentioned. 
 23 
 
In digital computation, variables have discrete values (i.e. 0 or 1); thus, each variable 
represents only one bit of information. Mathematical operations are performed by the 
Boolean algebraic functions (i.e. AND, OR, NOT, NAND, NOR, XOR, XNOR) [60]. 
Although digital computation is insensitive to device mismatch, quantization noise and 
round-off error degrade the accuracy of computation. Since the quantization noise and 
the round-off error only affect the Least Significant Bits (LSB), the degradation of 
accuracy is insignificant [61]. Addition of two 8-bit variables in the digital domain 
requires 240 transistors (i.e. 8 full adders). Also, multiplication of two 8-bit variables 
requires nearly 3000 transistors [62, 63]. 
In analogue computation, variables (i.e. current or voltage) have continuous values. 
Thus, each variable represents many bits of information. Mathematical operations are 
performed based on the physical characteristics of circuit elements (i.e. transistors, 
capacitors, resistors, floating gate devices) and Kirchhoff’s current and voltage laws 
(KCL and KVL). Therefore, analogue computation is sensitive to device mismatch. In a 
cascade of analogue circuits, the computational errors due to mismatches accumulate. 
According to the KCL, a current-mode analogue adder that computes the sum of several 
variables can be implemented simply by connecting wires to the same node. Besides, 
multiplication of two variables by two-quadrant and four-quadrant analogue multipliers 
requires 3 and 7 transistors, respectively [62, 63]. 
This comparison leads to the conclusion that computation of the DFT in the analogue 
domain saves hardware cost and power consumption. However, these advantages are 
achieved at the expense of precision degradation. The following section explains the 
existing architectures for the analogue Fourier transform processor.    
 
 24 
 
2.5 Analogue Fourier Transform Architectures 
2.5.1 The Direct Form Finite Impulse Response  
The DFT of a sequence of length N is [64] 
𝑋(𝑘) = ∑ 𝑥(𝑛)
𝑁−1
𝑛=0
𝑊𝑁
𝑛𝑘 ,        𝑘 = 0,1, … ,𝑁 − 1 
 
(2.8) 
where 𝑊𝑁
𝑛𝑘 = 𝑒−𝑗(2𝜋𝑘𝑛/𝑁) = cos(2𝜋𝑘𝑛/𝑁) − 𝑗 sin(2𝜋𝑘𝑛/𝑁) . Hence, 𝑋[𝑘]  can be 
considered as the discrete convolution of 𝑥[𝑛] with the impulse response 
ℎ(𝑛) = {
 𝑊𝑁
𝑛𝑘         𝑛 = 0,1, … ,𝑁 − 1
 
0                          𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
 
 
(2.9) 
Therefore, the direct form Finite Impulse Response (FIR) architecture (Figure 2.11) can 
be used to implement the DFT. In this structure, the tapped delay line is made by 𝑧−1 
blocks. At each tap, signal is weighted by the impulse response value. DFT processors 
that were implemented by using this architecture are available in [65, 66]. 
 
 
Figure 2-11: direct form realization of an FIR system [44] 
 25 
 
Since 𝑥(𝑛) is a complex number, expanding the complex multiplication 𝑥(𝑛)𝑊𝑁
𝑛𝑘  in 
equation (2.8) gives 
 
𝑋𝑅𝑒(𝑘) = ∑ 𝑥𝑅𝑒(𝑛)
𝑁−1
𝑛=0
cos (
2𝜋𝑘𝑛
𝑁
) + 𝑥𝐼𝑚(𝑛) sin (
2𝜋𝑘𝑛
𝑁
) ,        𝑘 = 0,1,… ,𝑁 − 1 
 
(2.10a) 
𝑋𝐼𝑚(𝑘) = ∑ 𝑥𝐼𝑚(𝑛)
𝑁−1
𝑛=0
cos (
2𝜋𝑘𝑛
𝑁
) − 𝑥𝑅𝑒(𝑛) sin (
2𝜋𝑘𝑛
𝑁
) ,        𝑘 = 0,1, … ,𝑁 − 1 
 
(2.10b) 
Therefore, each complex multiplication 𝑥(𝑛)𝑊𝑁
𝑛𝑘  requires four real multiplications. 
Thus, the direct computation of 𝑋(𝑘) requires 4𝑁 multiplications. Since 𝑋(𝑘) must be 
computed for different values of 𝑘, the FIR architecture requires 4𝑁2 multipliers [44]. 
Accordingly, for large values of 𝑁 , the area and power consumption of the FIR 
architecture are prohibitively large. Moreover, since mismatches in the multiplier 
circuits lead to erroneous calculations, the computational error in the FIR architecture 
has a quadratic growth.  
By using the current-mode multipliers, additions can be implemented simply by 
connecting the outputs of two multipliers to the same node (KCL). Thus, additions do 
not consume area or power. More importantly, additions do not contribute to the 
computational error. However, since the outputs of 2𝑁 − 1 multipliers are connected 
together, the connection capacitance increases by increasing 𝑁. Hence, as 𝑁 increases, 
the speed of processing decreases. 
 
 
 26 
 
2.5.2 The Fast Fourier Transform  
The FFT algorithms improve the computational efficiency of the DFT by exploiting the 
properties of 𝑊𝑁
𝑛𝑘 [67]   
𝑊𝑁
𝑟+𝑁/2
= −𝑊𝑁
𝑟                                (symmetry) (2.11a) 
𝑊𝑁
𝑘(𝑁−𝑛) = 𝑊𝑁
𝑛(𝑁−𝑘) = 𝑊𝑁
−𝑘𝑛      (symmetry)  (2.11b) 
𝑊𝑁
𝑘(𝑛+𝑁) = 𝑊𝑁
𝑛(𝑘+𝑁) = 𝑊𝑁
𝑘𝑛      (periodicity)  (2.11c) 
Moreover, for certain values of the product 𝑛𝑘 , 𝑊𝑁
𝑛𝑘  is simplified (i.e. 𝑊𝑁
0 = 1 and  
𝑊𝑁
𝑁/4
= −𝑗 ). The most commonly used FFT algorithm is the Cooley-Tukey algorithm 
which recursively breaks down the DFT into smaller DFTs [25]. Decimation-In-Time 
(DIT), Decimation-In-Frequency (DIF), Mixed-Radix, and Split-Radix are some of the 
variants of the Cooley-Tukey algorithm. The signal flow graph of an 8-point DIT FFT is 
shown in Figure 2.12. The Radix-2 FFT of length 8 is obtained by decomposing the 8-
point DFT into 2-point DFTs. Figure 2.13 depicts the signal flow graph of the 2-point 
DFT [44, 67, 68].  
 
Figure 2-12: signal flow graph of a Radix-2 DIT FFT of length 8 [44] 
 27 
 
 
Figure 2-13: signal flow graph of the 2-point DFT [44] 
A Radix-2 (DIT or DIF) FFT computes the DFT with (𝑁/2) log2 𝑁 − (𝑁 − 1) 
complex multiplications. Thus, the number of analogue multipliers that are required to 
implement a Radix-2 FFT is [41]  
𝑀 = 4𝑁 + 16 ∑
𝑁
2𝑘
𝑙𝑜𝑔2𝑁
𝑘=2
+ 12 ∑
𝑁
4
𝑙𝑜𝑔2𝑁
𝑘=3
 
 
(2.12) 
Bandwidth of the FFT architecture with 𝑆 stages is approximated by [69] 
 
𝐵𝑊𝐹𝐹𝑇 = 𝐵𝑊𝐷𝐹𝑇 √21/𝑆 − 1
2L
  (2.13) 
where 𝐵𝑊𝐷𝐹𝑇 is the bandwidth of the DFT circuit that is used as the building block of 
the FFT architecture, and  𝐿 is the order of the equivalent Low Pass Filter (LPF). The 
number of stages should be reduced to increase the bandwidth. The number of stages is 
obtained from [67] 
𝑆 = log𝑅 𝑁  (2.14) 
where 𝑅 denotes the radix size. Accordingly, 𝑆  is reduced by using higher radix. A 
higher radix also reduces the number of multipliers. Thereby, the computational error, 
together with the area and the power consumption are reduced. On the other hand, since 
𝑋(𝑘)s are not computed independently, computational errors propagate in the FFT 
lattice and affect all the results. The state-of-the-art analogue Fourier transform 
processors are based on the FFT architecture [70-73].  
 28 
 
2.6 Summary 
This chapter has presented background knowledge on the OFDM technology and the 
OFDM-based standards. Literature survey was also provided to identify the gaps in the 
previous researches. The computational efficiency, the resource costs, and the 
computational accuracy of the existing analogue Fourier transform architectures are 
compared together. This comparison leads to the conclusion that the FFT algorithms 
(i.e. DIT, DIF, etc.) are optimal for sampled signal.  
Migrating from the digital signal processing domain to the analogue signal processing 
domain should not be performed by simply implementing the same architecture with 
analogue circuits. Accordingly, a novel architecture that is designed based on the 
characteristics of the analogue signal processing domain is presented in the next 
chapter. 
 
  
 29 
 
Chapter 3   
REAL-TIME RECURSIVE DFT 
ARCHITECTURE 
The existing architectures for the analogue Fourier transform processor were explained 
in the previous chapter. In this chapter, the proposed architecture for the power-scalable 
variable-length analogue DFT processor is explained. The proposed architecture is 
compared with a similar DFT architecture that was designed for digital signal 
processing. Moreover, the computational efficiency, the resource costs, and the 
computational accuracy of the proposed architecture and the previous architectures are 
compared together. 
3.1 Real-Time Recursive DFT for Digital Signal 
The Goertzel algorithm [74] is a recursive DFT algorithm which was proposed for 
digital signal processing. Consider the DFT of a sequence of length N [64] 
𝑋(𝑘) = ∑ 𝑥(𝑛)
𝑁−1
𝑛=0
𝑊𝑁
𝑛𝑘 ,        𝑘 = 0,1, … ,𝑁 − 1 
 
(3.1) 
where 𝑊𝑁
𝑛𝑘 = 𝑒−𝑗(2𝜋𝑘𝑛/𝑁).  
 30 
 
The recursive algorithm proposed by Goertzel is achieved by using the periodicity of 
the 𝑊𝑁
𝑛𝑘, namely [44] 
𝑊𝑁
−𝑘𝑁 = 𝑒𝑗(2𝜋/𝑁)𝑁𝑘 = 𝑒𝑗2𝜋𝑘 = 1  (3.2) 
 Hence, multiplying the right side of equation (3.1) by 𝑊𝑁
−𝑘𝑁 does not affect the result. 
Accordingly [44], 
𝑋(𝑘) = 𝑊𝑁
−𝑘𝑁∑𝑥(𝑟)
𝑁−1
𝑟=0
𝑊𝑁
𝑘𝑟 = ∑ 𝑥(𝑟)
𝑁−1
𝑟=0
𝑊𝑁
−𝑘(𝑁−𝑟)
 
 
(3.3) 
Considering 𝑋(𝑘) as the response of a discrete-time system when  𝑛 = 𝑁 , equation 
(3.3) can be written in the time domain. Accordingly [44],  
𝑦(𝑛) = ∑ 𝑥(𝑟)
∞
𝑟=−∞
𝑊𝑁
−𝑘(𝑛−𝑟)𝑢(𝑛 − 𝑟) 
 
(3.4) 
where 𝑥(𝑟) = 0 for 𝑟 < 0 and 𝑟 ≥ 𝑁. Equation (3.4) can be interpreted as a discrete 
convolution of 𝑥(𝑛) and 𝑊𝑁
−𝑘𝑁𝑢(𝑛). Therefore, 𝑦(𝑛) is the response of a system with 
impulse response 𝑊𝑁
−𝑘𝑁𝑢(𝑛) to 𝑥(𝑛). Hence, the transfer function of the Goertzel DFT 
is [44] 
𝐻(𝑧) =
1
1 −𝑊𝑁
−𝑘𝑧−1
 
 (3.5) 
The signal flow graph of the Goertzel DFT is shown in Figure 3.1.  
 
Figure 3-1: signal flow graph of the Goertzel DFT [44] 
 31 
 
Since 𝑥(𝑛)  and 𝑊𝑁
−𝑘  are both complex, the multiplier and the adder in Figure 3.1 
represent 4 real multiplications and 4 real additions. Thus, 4𝑁 multiplications and 4𝑁 
additions are required to compute 𝑋(𝑘) for a particular value of 𝑘 (Figure 3.2). 
 
 
Figure 3-2: signal flow graph of the Goertzel DFT with real multipliers 
 
 32 
 
3.2 Real-Time Recursive DFT for Analogue Signal 
The previous section explained the real-time recursive DFT architecture which was 
designed for digital signal processing. In this section, the proposed real-time recursive 
DFT architecture which is designed for analogue signal processing is explained. In 
equation (3.1), consider 𝑎(𝑛) = 𝑥(𝑛)𝑊𝑁
𝑛𝑘. Expanding 𝑎(𝑛) gives  
 
𝑎𝑅𝑒(𝑛) = 𝑥𝑅𝑒(𝑛) cos (
2𝜋𝑘𝑛
𝑁
) + 𝑥𝐼𝑚(𝑛) sin (
2𝜋𝑘𝑛
𝑁
) 
 (3.6a) 
𝑎𝐼𝑚(𝑛) = 𝑥𝐼𝑚(𝑛) cos (
2𝜋𝑘𝑛
𝑁
) − 𝑥𝑅𝑒(𝑛) sin (
2𝜋𝑘𝑛
𝑁
) 
 (3.6b) 
Accordingly, 𝑋(𝑘)  is computed by multiplying samples of 𝑥(𝑡)  by samples of 
𝑒−𝑗(2𝜋𝑓𝑡) = cos(2𝜋𝑓𝑡) − 𝑗 sin(2𝜋𝑓𝑡), where 𝑓 = 𝑘/𝑁. Replacing the discrete samples 
with piecewise continuous signals gives 
 
𝑎𝑅𝑒(𝑡) = 𝑥𝑅𝑒(𝑡) cos (
2𝜋𝑘𝑡
𝑁
) + 𝑥𝐼𝑚(𝑡) sin (
2𝜋𝑘𝑡
𝑁
) 
  
(3.7a) 
𝑎𝐼𝑚(𝑡) = 𝑥𝐼𝑚(𝑡) cos (
2𝜋𝑘𝑡
𝑁
) − 𝑥𝑅𝑒(𝑡) sin (
2𝜋𝑘𝑡
𝑁
) 
  
(3.7b) 
𝑓𝑜𝑟       
𝑛𝑇
𝑁
≤ 𝑡 <
(𝑛 + 1)𝑇
𝑁
          𝑛 = 0,1, … , 𝑁 − 1 
 
 
where 𝑇 is the duration of 𝑁 samples. Thereby, 𝑥(𝑡) is piecewise weighted by the DFT 
coefficients. Hence, multiplications are performed without sampling.  
 
 
 33 
 
In equation (3.1), 𝑥(𝑛) is in the time-domain and 𝑋(𝑘) is in the frequency-domain. 
Since DFT architecture is a discrete-time system, 𝑋(𝑘) is the response of the system 
when  𝑛 = 𝑁 − 1. Considering 𝑋(𝑘) = 𝑦(𝑁 − 1), equation (3.1) can be written in the 
time domain. 
 
𝑦(𝑁 − 1) = ∑ 𝑎(𝑛)
𝑁−1
𝑛=0
 
 
(3.8) 
where 𝑎(𝑛) = 𝑥(𝑛)𝑊𝑁
𝑛𝑘. The above equation describes a discrete-time integrator. To 
obtain the difference equation of the integrator, equation (3.8) can be rewritten as 
 
𝑦(𝑁 − 1) = 𝑎(𝑁 − 1) +∑ 𝑎(𝑛)
𝑁−2
𝑛=0
 
 
(3.9) 
Also, 
𝑦(𝑁 − 2) = ∑ 𝑎(𝑛)
𝑁−2
𝑛=0
 
 
(3.10) 
Combining equations (3.9) and (3.10) gives 
𝑦(𝑁 − 1) = 𝑎(𝑁 − 1) + 𝑦(𝑁 − 2)  (3.11) 
The z-transform of the above difference equation is  
𝑧−1𝑌(𝑧) = 𝑧−1𝐴(𝑧) + 𝑧−2𝑌(𝑧)  (3.12) 
Accordingly, the transfer function of the discrete-time integrator is given by 
𝐻(𝑧) =
𝑌(𝑧)
𝐴(𝑧)
=
1
1 − 𝑧−1
 
 (3.13) 
 34 
 
The block diagram representation of the integrator based on equation (3.13) is shown in 
Figure 3.3. The proposed real-time recursive DFT architecture is depicted in Figure 3.4. 
The piecewise Sine and Cosine waves can be generated by the Digital to Analogue 
Converter (DAC). 
 
 
Figure 3-3: block diagram of a recursive difference equation representing the discrete-time integrator 
 
 
Figure 3-4: architecture of the proposed real-time recursive DFT  
 35 
 
3.3 Advantages of the Proposed Architecture  
The analogue Fourier transform architectures that are available in the literature (FIR 
DFT and FFT) were explained in the previous chapter. This section provides a 
comparison between the previous analogue Fourier transform architectures and the 
proposed analogue DFT architecture. Also, the advantage of the proposed DFT 
architecture over the previous real-time recursive DFT architecture (Goertzel DFT) is 
discussed. Table 3-1 shows the computational efficiency and the resource costs of the 
aforementioned DFT architectures. 
 
Table 3-1: computational efficiency and resource costs of different DFT architectures 
Architecture Number of Multipliers Number of Multiplications 
FIR DFT 4𝑁2 4𝑁2 
 
Radix-2 FFT 4𝑁 + 16 ∑
𝑁
2𝑘
𝑙𝑜𝑔2𝑁
𝑘=2
+ 12 ∑
𝑁
4
𝑙𝑜𝑔2𝑁
𝑘=3
 4𝑁 + 16 ∑
𝑁
2𝑘
𝑙𝑜𝑔2𝑁
𝑘=2
+ 12 ∑
𝑁
4
𝑙𝑜𝑔2𝑁
𝑘=3
 
Goertzel DFT 4𝑁 4𝑁2 
Proposed DFT 4𝑁 4𝑁2 
 
In the FIR DFT and FFT architectures, the number of multiplications and the number of 
multipliers are equal while in the real-time recursive DFT architectures each multiplier 
performs 𝑁  multiplications. The Goertzel DFT and the proposed DFT require 4𝑁2 
multiplications to compute 𝑋(𝑘) for different values of 𝑘 . These multiplications are 
performed by 4𝑁 multipliers.  
Serial-to-parallel conversion in FIR DFT and FFT architectures relaxes the bandwidth 
requirement of multipliers. Hence, in the FIR DFT and FFT architectures, frequency of 
multipliers is 𝑓𝑀 = 𝑓𝑖𝑛/𝑁  , where 𝑓𝑖𝑛  is the frequency of input signal. On the other 
hand, the frequency of multipliers in the proposed architecture is 𝑓𝑖𝑛 .  
 36 
 
The total power consumption of multipliers is 𝑃𝑇 = 𝑀𝑃𝑀, where 𝑀 is the number of 
multipliers, and 𝑃𝑀 is the power consumption of each multiplier. Also, 𝑃𝑀 ∝ 𝑓𝑀. Hence, 
the FIR DFT and the real-time recursive DFT both have 𝑃𝑇 ∝ 4𝑁𝑓𝑖𝑛 . Thus, reduction 
of the number of multipliers does not reduce the power dissipation.  
Since analogue multipliers are hardwired, they are biased whether they are in use or not. 
Accordingly, in the FIR DFT and FFT architectures, the power consumption is not 
scalable with the transform length. However, since the proposed architecture performs 
multiplications serially, its power consumption is scalable with the transform length. 
Unlike the previous architectures, the proposed architecture does not require additional 
multipliers to compute the DFT of a longer sequence. Hence, the proposed architecture 
is especially suitable for variable-length DFT processors.  
While the computational errors propagate in the FFT lattice (Figure 2.12) and affect all 
results, the proposed architecture (Figure 3.4) avoids the propagation of computational 
errors by computing DFTs independently.  
In the classical OFDM receiver, a signal is sampled before digitization. Based on the 
Nyquist theorem the sampling frequency must be at least twice the signal frequency. 
Thus, signal must be decimated before processing by the digital FFT (Figure 3.5(a)) [1]. 
The FIR DFT, the analogue FFT, and the Goertzel DFT require a sampled signal. Thus, 
all these architectures need an analogue decimation filter ahead of them (Figure 3.5(b)). 
The simplest realization of an analogue decimation filter is a 𝐷-tap FIR filter which 
loads 𝐷 successive samples into 𝐷 capacitors, and then sum their charges [75]. In the 
proposed DFT architecture, multiplications are performed before sampling. Hence, by 
using the proposed real-time recursive DFT processor, the analogue decimation filter is 
eliminated (Figure 3.5(c)).  
 37 
 
 
Figure 3-5: baseband signal processing section in (a) the classical OFDM receiver (b) the OFDM receiver 
with an analogue FFT or FIR DFT or Goertzel DFT (c) the OFDM receiver with the proposed DFT 
3.4 Summary 
In this chapter, the design techniques that are applied to make the proposed architecture 
reconfigurable and suitable for the multi-standard OFDM transceivers were discussed. 
The optimal architecture for the analogue DFT is achieved by keeping the signal 
continuous as long as possible. To this end, the DFT coefficients are formed into 
piecewise continuous signals. Thereby, the transform length can be changed by 
changing the coefficient signals. Instead of dedicating multipliers to individual samples 
of the signal, multipliers perform 𝑁  multiplications serially. Also, the power 
consumption of the proposed architecture is scalable with the transform length. 
Moreover, the proposed DFT architecture does not require an analogue decimation 
filter. Performance of the proposed DFT architecture is analysed in the next chapter. 
  
 38 
 
Chapter 4   
SYSTEM PERFORMANCE ANALYSIS  
In this chapter, the performance metrics and the behavioural models for the Fourier 
Transform processor are defined. The performance requirements of the Analogue DFT 
processor are derived. The behavioural model is used to make the system simulations 
for the real-time recursive DFT processor and the analogue FFT processor. Finally, the 
performance of the simulated systems is analysed by applying the Monte Carlo method.  
4.1 Performance Metrics for DFT Processor 
In digital communication systems, the Error Vector Magnitude (EVM) is a measure that 
is used to quantify the performance. By definition, EVM is the Root Mean Square 
(RMS) of the difference between the ideal symbols and the demodulated symbols [1]. 
 
𝐸𝑉𝑀 = √  
1
𝑁
∑ [(𝐼𝑜𝑢𝑡(𝑘) − 𝐼𝑖𝑑𝑒𝑎𝑙(𝑘))2 + (𝑄𝑜𝑢𝑡(𝑘) − 𝑄𝑖𝑑𝑒𝑎𝑙(𝑘))2]
𝑁−1
𝑘=0
1
𝑁
∑ [𝐼𝑖𝑑𝑒𝑎𝑙(𝑘)2 + 𝑄𝑖𝑑𝑒𝑎𝑙(𝑘)2]
𝑁−1
𝑘=0
 
 
(4.1) 
where 𝐼(𝑘) and 𝑄(𝑘) are the In-phase and Quadrature components of the kth symbol. 
Hence, EVM is the square root of the noise and distortion power to the signal power 
ratio; which is the inverse of the Signal to Noise and Distortion Ratio (SNDR). 
 39 
 
𝐸𝑉𝑀 = √  
𝑁𝑜𝑖𝑠𝑒 + 𝐷𝑖𝑠𝑡𝑜𝑟𝑡𝑖𝑜𝑛 𝑃𝑜𝑤𝑒𝑟
𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟
 =  
1
√𝑆𝑁𝐷𝑅
 
(4.2) 
Thereby 
𝑆𝑁𝐷𝑅 = 
1
𝐸𝑉𝑀2
 
(4.3) 
which in decibels is  
𝑆𝑁𝐷𝑅𝑑𝐵 = 10 log10 (
1
𝐸𝑉𝑀2
) =  20 log10 (
1
𝐸𝑉𝑀
) 
(4.4) 
 Thus 
𝑆𝑁𝐷𝑅 = 20 log10√  
1
𝑁
∑ [𝐼𝑖𝑑𝑒𝑎𝑙(𝑘)2 + 𝑄𝑖𝑑𝑒𝑎𝑙(𝑘)2]
𝑁−1
𝑘=0
1
𝑁
∑ [(𝐼𝑜𝑢𝑡(𝑘) − 𝐼𝑖𝑑𝑒𝑎𝑙(𝑘))2 + (𝑄𝑜𝑢𝑡(𝑘) − 𝑄𝑖𝑑𝑒𝑎𝑙(𝑘))2]
𝑁−1
𝑘=0
 
 
(4.5) 
The performance of the DFT processor must be evaluated at weak and strong signal 
levels [41]. Therefore, the aim of the simulations is to measure the SNDR as a function 
of the input signal magnitude. A typical SNDR versus input magnitude curve is shown 
in Figure 4.1. At weak signal levels, noise and distortion corrupt the SNDR. As the 
magnitude increases, impact of the noise and distortion on the SNDR decreases. At full 
scale signal, clipping reduces the SNDR rapidly. Hence, the input magnitude that gives 
the peak SNDR is the optimal operating point of the circuit. However, the signal is not 
equalized before entering the DFT processor; thus, it is a mixture of strong and weak 
sub-channels. Hence, the dynamic range of the circuit is the main performance metric. 
By definition, dynamic range is the ratio of the maximum input level that the circuit can 
tolerate to the minimum input level that it can detect. In logarithmic scale, dynamic 
range is the difference between the maximum and minimum acceptable input levels, 
which is the width of the SNDR curve at the minimum required SNDR [41]. 
 40 
 
 
Figure 4-1: Typical SNDR versus input magnitude curve [41] 
4.2 Performance Requirements 
Minimum receiver SNDR requirements that guarantee Bit Error Ratio (BER) of 10-6 in 
an Additive White Gaussian Noise (AWGN) channel are given in Table 4-1[33, 34]. 
Since 64-QAM provides the highest data rate for both WiFi and WiMAX, it is the most 
sensitive modulation scheme to distortion and noise. Accordingly, 64-QAM has the 
highest SNDR requirement. The dynamic range of the analogue DFT is determined by 
considering the minimum required SNDR and the maximum signal level that receiver 
should tolerate. The OFDM symbol is composed of a large number of modulated 
subcarriers (𝑁 ≫ 1). Hence, according to the Central Limit Theorem (CLT) the OFDM 
symbol appears as a Gaussian noise in the time domain.  
 
 
 41 
 
Table 4-1: Receiver performance requirements for BER = 10-6 
Modulation Coding rate Receiver SNDR (dB) 
BPSK 1/2 3 
QPSK 1/2 5 
QPSK 3/4 8 
16-QAM 1/2 11 
16-QAM 3/4  14 
64-QAM 1/2 16 
64-QAM 2/3 18 
64-QAM 3/4 20 
 
Therefore, the Peak to Average Power Ratio (PAPR) of the signal, which is the ratio 
between the maximum instantaneous power and the mean power, can be very high. If 
clipping limits the PAPR, the SNDR will be degraded. Due to the statistical nature of 
the PAPR for OFDM signals, the probability of having a given PAPR is estimated by a 
Complementary Cumulative Distribution Function (CCDF). Figure 4.2 shows the PAPR 
CCDF of two OFDM signals with WiFi and WiMAX standards. Both signals are 
modulated with 64-QAM. Although WiFi and WiMAX have different number of 
subcarriers (i.e. 64 and 2048 respectively), their CCDFs are quite the same. Accordingly, 
OFDM symbols have consistent PAPR distribution [1]. 
 42 
 
 
Figure 4-2: PAPR CCDFs of two OFDM signals with WiFi and WiMAX standards [1] 
 
The block diagram of the baseband signal processing part of the classical OFDM 
receiver and the proposed receiver architecture are shown in Figure 4.3. The channel 
selection filter cannot eliminate the Adjacent Channel Interference (ACI) completely. 
Thus, when ACI is stronger than the desired signal, ACI makes the most contribution to 
the received signal amplitude. The Automatic Gain Control (AGC) sets the peak signal 
level to the full scale level of the next stage. Hence, in the classical architecture, the 
desired signal might be below the quantization level of the ADC if no safety margin is 
considered for the dynamic range of the ADC [76]. When DFT processor is placed 
ahead of the ADC, signal is processed without quantization. However, noise and 
distortions of the analogue DFT corrupt the desired signal. Hence, a safety margin for 
the dynamic range of the analogue DFT is required. 
Since the analogue front-end stages before the ADC (in the classical ODFM receiver) 
and the analogue DFT (in the proposed receiver) are the same, dynamic range 
requirements of the ADC and the analogue DFT are the same. In other words, reducing 
the dynamic range requirement of the ADC by moving the DFT processor from the 
digital back-end to the analogue front-end is at the cost of increasing the dynamic range 
requirement of the DFT processor.  
 43 
 
 
Figure 4-3: The block diagram of the baseband signal processing part of (a) the classical OFDM receiver 
(b) the proposed OFDM receiver 
 
In the estimation of the dynamic range, the AGC inaccuracy, the residual DC offset, and 
the thermal noise of the analogue front-end must also be taken in to account [1, 76]. A 
graphical decomposition of the analogue DFT dynamic range is depicted in Figure 4.4. 
As the graph indicates, the analogue DFT processor requires a dynamic range between 
34dB to 51dB for the different modulation schemes of WiFi and WiMAX. 
 
 
Figure 4-4: Analogue DFT dynamic range derivation 
4.3 Behavioural Modelling  
Behavioural system simulation is a top-down approach that is used to evaluate and 
optimize the performance of the proposed architecture. The behavioural model is based 
on the functions of the building blocks of the system. Based on the equation 3.1, 
multipliers and integrators are the main building blocks of the Fourier transform. This 
section describes the behavioural models of the multiplier and the integrator. This 
section also explains how the aforementioned models are used to simulate the real-time 
recursive DFT processor and the FFT processor. 
 44 
 
4.3.1 Behavioural Model of the Multiplier 
One approach to implement an analogue multiplier is to scale the current of the signal 
using a variable gain transconductor. Figure 4.5 depicts the block diagram of an 
analogue multiplier that scales the input signal (voltage 𝑉1) by the variable gain (voltage 
𝑉2), and converts the output current to voltage by a transresistor.  
 
Figure 4-5: Block diagram of the analogue multiplier 
The behavioural model of the multiplier is defined by parameters that are derived from 
two functions, 𝐼𝑜𝑢𝑡 = 𝑓(𝑉𝑖𝑛) and its derivative 𝐺𝑚 = 𝑓
′(𝑉𝑖𝑛) (Figure 4.6). The model 
parameters extracted from the 𝐼𝑜𝑢𝑡 versus 𝑉𝑖𝑛 curve are 𝐼𝑚𝑎𝑥 (the DC bias current), and 
𝑉𝑖𝑛,𝑜𝑠 (the input offset voltage). From the 𝐺𝑚 versus 𝑉𝑖𝑛 curve, the model parameters are 
𝐺𝑚𝑜  (the small signal transconductance), 𝐺𝑚,𝑜𝑠  (the deviation in the 𝐺𝑚𝑜  at 𝑉𝑖𝑛 = 0 ),   
𝑎  (the extent of the quasi-linear region), 𝑏  (swing of the input voltage), 𝐴𝑟  (the 
magnitude of the ripple in quasi-linear region), 𝛾 (the slope of the quasi-linear region), 
and N (the number of ripples in quasi-linear region).  
 45 
 
 
Figure 4-6: Curves of the multiplier behavioural model.  
(a) Input-Output characteristic of transconductance (b) the derivative of (a) [77] 
 
Using the aforementioned parameters, the 𝐼𝑜𝑢𝑡 = 𝑓(𝑉𝑖𝑛) is defined as [77]   
 
𝐼𝑜𝑢𝑡 =
{
 
 
 
 
 
 
 
 
 
 −
𝐴1𝑏
2
−
𝐴2𝑎
2
                                                                                                   𝑉𝑖𝑛 ≤ −𝑏
𝐴1(𝑎 − 𝑏)
2𝜋
sin (𝜋
𝑉𝑖𝑛 + 𝑎
𝑎 − 𝑏
) +
𝐴1𝑉𝑖𝑛 − 𝑎𝐴2
2
                                  −𝑏 < 𝑉𝑖𝑛 ≤ −𝑎  
𝐺𝑚𝑜 [−1
𝑁
𝑎𝐴𝑟
2𝑁𝜋
sin (
𝜋𝑁𝑉𝑖𝑛
𝑎
) + (1 + 𝐺𝑚,𝑜𝑠 +
𝐴𝑟
2
)𝑉𝑖𝑛 +
𝛾
2𝑎
𝑉𝑖𝑛
2 ]   −𝑎 < 𝑉𝑖𝑛 ≤ 𝑎
𝐴3(𝑏 − 𝑎)
2𝜋
sin (𝜋
𝑉𝑖𝑛 − 𝑎
𝑏 − 𝑎
) +
𝐴3𝑉𝑖𝑛 + 𝑎𝐴2
2
                                         𝑎 < 𝑉𝑖𝑛 ≤ 𝑏
𝐴3𝑏
2
+
𝐴2𝑎
2
                                                                                                          𝑉𝑖𝑛 > 𝑏
 
(4.6a) 
 
 46 
 
where 
𝐴1 = 𝐺𝑚𝑜(1 + 𝐴𝑟 + 𝐺𝑚,𝑜𝑠 − 𝛾)
𝐴2 = 𝐺𝑚𝑜(1 + 𝐺𝑚,𝑜𝑠)                  
𝐴3 = 𝐺𝑚𝑜(1 + 𝐴𝑟 + 𝐺𝑚,𝑜𝑠 + 𝛾)
𝑏 =
2𝐼𝑚𝑎𝑥 − 𝐴2𝑎
𝐴3
                      
 
 
 
 
(4.6b) 
As the equation (4.6a) indicates, multiplication occurs in the quasi-linear region 
(−𝑎 ≤ 𝑉𝑖𝑛 ≤ 𝑎) where signal is scaled by 𝐺𝑚𝑜  . Thus, ideally the transconductance 
curve must be a straight line in the [– 𝑎 , 𝑎] interval. In reality, the nonlinear behaviour 
of the multiplier deviates the input-output characteristic from a straight line. Hence, the 
interval [– 𝑎 , 𝑎] is quasi-linear [77].  
In Simulink, the transconductance multiplier is modelled by a MATLAB Function 
block which provides the function of 𝐼𝑜𝑢𝑡 (equation 4.6). The MATLAB code of this 
function is provided in Appendix A.    
4.3.2 Behavioural Model of the Integrator 
The signal at the output of the multiplier is piecewise continuous. For an N-point DFT, 
the amplitude of N pieces must be summed together. The discrete-time integrator takes 
samples of each piece and provides their sum. The z-domain transfer function of the 
discrete-time integrator is [78]  
𝐻(𝑧) = 𝑔
𝑧−1
1 − 𝛼𝑧−1
 
 (4.7) 
where 𝑔 and 𝛼 are the gain and the leakage of the integrator, respectively. This transfer 
function can be realized by the Switched-Capacitor (SC) integrator (Figure 4.7) [78]. 𝐶𝑆 
is the sampling capacitor and 𝐶𝐼 is the integrating capacitor. The timing diagram of the 
switches is provided in chapter 5.  
 47 
 
 
Figure 4-7: Switched-Capacitor integrator [78]  
 
The transfer function of the SC integrator is modelled in Simulink (Figure 4.8). 
Integrator provides the result of the N-point DFT after 𝑁𝑓𝑆 𝑓𝑖𝑛⁄  iterations (𝑓𝑖𝑛  is 
frequency of the input signal, and 𝑓𝑆 is sampling frequency of the delay block). Thus, 
integrator should reset to zero after 𝑁 𝑓𝑆 𝑓𝑖𝑛⁄  iterations. To adjust the reset time, the 
delay block is placed in the feedback loop. The integrator leakage (𝛼) is modelled by a 
Gain block.  
 
Figure 4-8: behavioural model of the switched-capacitor integrator in Simulink  
 
In the presence of mismatch, the Operational amplifier (Op-amp) suffers from dc offset 
at its output. The output dc offset can be defined as the input-referred offset voltage that 
makes the output voltage zero. The input-referred offset voltage is modelled by 𝑉𝑜𝑠. For 
an ideal integrator, 𝛼 = 1 and 𝑉𝑜𝑠 = 0. Sensitivity of the recursive DFT processor to 𝛼 
and 𝑉𝑜𝑠 is analysed in section 4.4.3. 
 48 
 
4.3.3 Behavioural Modelling of the FFT Processor 
The analogue FFT architecture was explained in chapter 2. Here, the behavioural model 
of the multiplier is used to model a radix-2 FFT processor of length 8. Considering the 
2-point DFT (Figure 2.13) as the unit cell of the FFT, the signal flow graph in Figure 
2.12 can be rearranged as illustrated in Figure 4.9. Since 𝑥(𝑛) and 𝑊𝑁
𝑛𝑘 are complex, 
each of the signal flow lines in this diagram represents two signal flow lines in the 
Simulink model. 
 
 
Figure 4-9: signal flow graph of a Radix-2 DIT FFT of length 8 [41] 
 
Considering 𝑎 = 𝑎𝑟𝑒 + 𝑗𝑎𝑖𝑚  and 𝑏 = 𝑏𝑟𝑒 + 𝑗𝑏𝑖𝑚  as the inputs of the 2-point DFT, 
results of the 2-point DFT are  
 
𝐴 = 𝐴𝑟𝑒 + 𝑗𝐴𝑖𝑚 = 𝑎 +𝑊𝑁
𝑛𝑘𝑏  (4.8a) 
𝐵 = 𝐵𝑟𝑒 + 𝑗𝐵𝑖𝑚 = 𝑎 −𝑊𝑁
𝑛𝑘𝑏  (4.8b) 
where 𝑊𝑁
𝑛𝑘 = cos(2𝜋𝑘𝑛/𝑁) − 𝑗 sin(2𝜋𝑘𝑛/𝑁).  
 49 
 
Accordingly,  
𝐴𝑟𝑒 = 𝑎𝑟𝑒 + 𝑏𝑟𝑒 cos (
2𝜋𝑘𝑛
𝑁
) + 𝑏𝑖𝑚 sin (
2𝜋𝑘𝑛
𝑁
) 
  
(4.9a) 
𝐴𝑖𝑚 = 𝑎𝑖𝑚 − 𝑏𝑟𝑒 sin (
2𝜋𝑘𝑛
𝑁
) + 𝑏𝑖𝑚 cos (
2𝜋𝑘𝑛
𝑁
) 
  
(4.9b) 
𝐵𝑟𝑒 = 𝑎𝑟𝑒 − 𝑏𝑟𝑒 cos (
2𝜋𝑘𝑛
𝑁
) − 𝑏𝑖𝑚 sin (
2𝜋𝑘𝑛
𝑁
) 
 
(4.9c) 
𝐵𝑖𝑚 = 𝑎𝑖𝑚 + 𝑏𝑟𝑒 sin (
2𝜋𝑘𝑛
𝑁
) − 𝑏𝑖𝑚 cos (
2𝜋𝑘𝑛
𝑁
) 
 
(4.9d) 
 
Thus, coefficient values are 1, cos(2𝜋𝑘𝑛 𝑁⁄ ), and sin(2𝜋𝑘𝑛 𝑁⁄ ). The transconductance 
values that represent these coefficient values are 
 
𝐺𝑚1 = 𝐺𝑚𝑜  (4.10a) 
𝐺𝑚𝐶 = cos (
2𝜋𝑘𝑛
𝑁
)𝐺𝑚𝑜 
  
(4.10b) 
𝐺𝑚𝑆 = sin (
2𝜋𝑘𝑛
𝑁
)𝐺𝑚𝑜 
 
(4.10c) 
 
Figure 4.10 shows the Simulink model of the 2-point DFT with 𝑊8
1  or 𝑊8
3  twiddle 
factor. Transresistors are modeled by Gain blocks.  
 50 
 
 
Figure 4-10: 2-point DFT with 𝑊8
1 or 𝑊8
3 twiddle factor 
 51 
 
For 𝑊8
0 = 1, outputs of the 2-point DFT are 
 
𝐴 = (𝑎𝑟𝑒 + 𝑏𝑟𝑒) + 𝑗(𝑎𝑖𝑚 + 𝑏𝑖𝑚)  (4.11a) 
𝐵 = (𝑎𝑟𝑒 − 𝑏𝑟𝑒) + 𝑗(𝑎𝑖𝑚 − 𝑏𝑖𝑚)  (4.11b) 
Additions are performed by connecting the outputs of the transconductors to the same 
node (KCL). Hence, even though all coefficient values are one, voltage samples must be 
converted to currents. Simulink model of the 2-point DFT with 𝑊8
0 twiddle factor is 
depicted in Figure 4.11. 
 52 
 
 
Figure 4-11: 2-point DFT with 𝑊8
0 twiddle factor 
 
 
 
 
 
 
 53 
 
For 𝑊8
2 = −𝑗 , outputs of the 2-point DFT are 
 
𝐴 = (𝑎𝑟𝑒 + 𝑏𝑖𝑚) + 𝑗(𝑎𝑖𝑚 − 𝑏𝑟𝑒)  (4.12a) 
𝐵 = (𝑎𝑟𝑒 − 𝑏𝑖𝑚) + 𝑗(𝑎𝑖𝑚 + 𝑏𝑟𝑒)  (4.12b) 
Simulink model of the 2-point DFT with 𝑊8
2 twiddle factor is shown in Figure 4.12. 
 
Figure 4-12: 2-point DFT with 𝑊8
2 twiddle factor  
 
 54 
 
The above 2-point DFTs are used to build the FFT lattice in Figure 4.9. The block 
diagram of the analogue FFT processor is shown in Figure 4.13. The FFT processor 
converts the input signal to parallel samples by a serial-to-parallel converter. The FFT 
lattice provides the Fourier transform of the signal. Finally, the parallel outputs of the 
FFT lattice are converted to a serial data stream by the parallel-to-serial converter.    
  
 
Figure 4-13: behavioural model of the analogue FFT processor in Simulink 
4.3.4 Behavioural Modelling of the Recursive DFT Processor 
Chapter 3 explained the proposed real-time recursive DFT architecture. In this section, a 
real-time recursive DFT processor of length 8 is modeled by the behavioural models of 
the multiplier and the integrator. Figure 4.14 shows a 1-point recursive DFT. The Cos 
and Sin blocks generate the piecewise continuous coefficients. The coefficient signals 
are applied to the transconductance multipliers.  
 55 
 
 
Figure 4-14: 1-point DFT with piecewise continuous coefficients 
 
 
 
 
 
 
 56 
 
Thus,  
 
𝑎𝑅𝑒(𝑡) = 𝑥𝑅𝑒(𝑡) cos (
2𝜋𝑘𝑡
𝑁
) + 𝑥𝐼𝑚(𝑡) sin (
2𝜋𝑘𝑡
𝑁
) 
  
(4.13a) 
𝑎𝐼𝑚(𝑡) = 𝑥𝐼𝑚(𝑡) cos (
2𝜋𝑘𝑡
𝑁
) − 𝑥𝑅𝑒(𝑡) sin (
2𝜋𝑘𝑡
𝑁
) 
  
(4.13b) 
𝑓𝑜𝑟       
𝑛𝑇
𝑁
≤ 𝑡 <
(𝑛 + 1)𝑇
𝑁
          𝑛 = 0,1, … , 𝑁 − 1 
 
 
 are provided at the outputs of the Gain blocks (i.e. Transresistors). As mentioned in 
section 4.3.2, integrator provides the result of the DFT after 𝑁 𝑓𝑆 𝑓𝑖𝑛⁄  iterations. Hence, 
 
𝑋𝑅𝑒(𝑘) =
𝑓𝑆
𝑓𝑖𝑛
 ∑ 𝑎𝑅𝑒(𝑛)
𝑁−1
𝑛=0
 
 
(4.14a) 
𝑋𝐼𝑚(𝑘) =
𝑓𝑆
𝑓𝑖𝑛
 ∑ 𝑎𝐼𝑚(𝑛)
𝑁−1
𝑛=0
 
 
(4.14b) 
The block diagram of the real-time recursive DFT processor is illustrated in Figure 4.15. 
Eight 1-point recursive DFTs are used in parallel to create an 8-point recursive DFT 
processor.  
 
Figure 4-15: behavioural model of the real-time recursive DFT processor in Simulink 
 57 
 
4.4 Determining the Design Specifications  
In this section, design specifications of an 8-point recursive DFT processor are 
determined. For this purpose, an OFDM signal with QPSK modulation is applied to the 
input of the DFT processor and sensitivity of the DFT processor to each of the 
behavioural model parameters is analysed. The 𝜎(𝑉𝑖𝑛,𝑜𝑠) , 𝜎(𝐺𝑚,𝑜𝑠) and 𝜎(𝐴𝑟) model 
the mismatch between multipliers; thus, 𝑉𝑖𝑛,𝑜𝑠 , 𝐺𝑚,𝑜𝑠 and 𝐴𝑟 values are unique to each 
multiplier. The 𝜎(𝑉𝑜𝑠) models the mismatch between integrators; hence, 𝑉𝑜𝑠 is unique to 
each integrator. Other parameters are global. 
4.4.1 Power Budget 
The objective is to design an analogue DFT processor that consumes less power than the 
digital FFT processor. A power-scalable variable-length digital FFT processor that was 
fabricated in 250nm CMOS consumes 310mW power to perform 8-point FFT at 
200MHz [79].  Normalizing the power consumption to the 180nm technology and 
20MHz frequency gives 
𝑃𝑜𝑤𝑒𝑟 =  310 𝑚𝑊 (
20 𝑀𝐻𝑧
200 𝑀𝐻𝑧
) (
180 𝑛𝑚
250 𝑛𝑚
) (
1.8 𝑣
2.5 𝑣
)
2
=  11.6 𝑚𝑊 
  
(4.15) 
The real-time recursive DFT requires 4N multipliers and 2N differential integrators to 
compute N-point DFT. Hence, the power consumption of the real-time recursive DFT 
processor is  
𝑃𝑜𝑤𝑒𝑟𝑅𝑒𝑐𝑢𝑟𝑠𝑖𝑣𝑒 𝐷𝐹𝑇  ≅ 4𝑁(𝑃𝑜𝑤𝑒𝑟𝑀𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑒𝑟 + 𝑃𝑜𝑤𝑒𝑟𝑆𝑖𝑛𝑔𝑙𝑒−𝑒𝑛𝑑𝑒𝑑 𝑖𝑛𝑡𝑒𝑔𝑟𝑎𝑡𝑜𝑟)  (4.16) 
Accordingly, 
𝑃𝑜𝑤𝑒𝑟𝑅𝑒𝑐𝑢𝑟𝑠𝑖𝑣𝑒 𝐷𝐹𝑇  ≅ 4𝑁𝑉𝐷𝐷(𝐼𝑀𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑒𝑟 + 𝐼𝑆𝑖𝑛𝑔𝑙𝑒−𝑒𝑛𝑑𝑒𝑑 𝑖𝑛𝑡𝑒𝑔𝑟𝑎𝑡𝑜𝑟)  (4.17) 
 58 
 
where 𝐼𝑀𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑒𝑟  and 𝐼𝑆𝑖𝑛𝑔𝑙𝑒−𝑒𝑛𝑑𝑒𝑑 𝑖𝑛𝑡𝑒𝑔𝑟𝑎𝑡𝑜𝑟  are the current supplies of the multiplier 
and the single-ended integrator, respectively. 𝑉𝐷𝐷 is the voltage supply of the multiplier 
and the integrator. In order to achieve a power consumption less than 11.6 𝑚𝑊 for the 
DFT processor, 𝐼𝑀𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑒𝑟 = 80 𝜇𝐴 and 𝐼𝑆𝑖𝑛𝑔𝑙𝑒−𝑒𝑛𝑑𝑒𝑑 𝑖𝑛𝑡𝑒𝑔𝑟𝑎𝑡𝑜𝑟 = 50 𝜇𝐴 are selected.  
4.4.2 Design Specifications of the Multiplier 
Considering the input-output characteristic of the transconductance (Figure 4.6(a)), the 
linear range of an ideal multiplier is [−𝑏 , 𝑏]. Hence, for an ideal multiplier,  
 
𝑏 =
𝐼𝑚𝑎𝑥
𝐺𝑚𝑜
 
 (4.18) 
In the previous section, 𝐼𝑚𝑎𝑥 = 80 𝜇𝐴 was selected. The input-output characteristics of 
ideal multipliers with different values of 𝐺𝑚𝑜 are shown in Figure 4.16. As the figure 
illustrates, increasing the 𝐺𝑚𝑜  reduces the linear range of the multiplier. Gain of the 
multiplier is 𝐴𝑣 = 𝐺𝑚𝑜𝑅𝐷 , where 𝑅𝐷 is the resistance of the transresistor. 𝐴𝑣 = 1 𝑉/𝑉 
is selected; hence, 𝑅𝐷 = 1 𝐺𝑚𝑜⁄  . SNDR curves for different values of 𝐺𝑚𝑜  were 
obtained by running the behavioural system simulation (Figure 4.17). Results of this 
simulation indicate that smaller 𝐺𝑚𝑜  results in better tolerance of high signal levels. 
Hence, 𝐺𝑚𝑜 = 200𝜇𝐴/𝑉 is selected. 
 59 
 
 
Figure 4-16: The input-output characteristics of ideal multipliers 
 
Figure 4-17: SNDR curves for different values of 𝐺𝑚𝑜 
 60 
 
In practice, linear range of the multiplier is less than its input swing ( 𝑎 < 𝑏 ). 
Considering 𝐺𝑚𝑜 = 200𝜇𝐴/𝑉, SNDR curves for different linear ranges were obtained 
by running the behavioural system simulation (Figure 4.18).  Results of this analysis 
indicate that a DFT processor with smaller linear region is less tolerant to high signal 
levels. Hence, the DFT processor with smaller linear region has smaller dynamic range. 
A non-ideal linear range of 𝑎 = 𝑏 2⁄ = 0.2𝑉 is selected for the system performance 
analysis. 
 
 
Figure 4-18: SNDR curves for different linear ranges 
 
 
 
 
 61 
 
Device mismatches result in transconductance error [77]. Multipliers with various 
transconductance errors are modelled by assuming that 𝐺𝑚,𝑜𝑠 has a normal distribution 
with zero mean and standard deviation  𝜎(𝐺𝑚,𝑜𝑠) . Considering 𝐺𝑚𝑜 = 200𝜇𝐴/𝑉 and =
0.2𝑉 , the effect of transconductance error on the performance of the DFT processor is 
analysed. Typical values of 𝜎(𝐺𝑚,𝑜𝑠 )  are obtained from a previous study on the 
analogue FFT processor [77]. Figure 4.19 illustrates the results of this analysis. These 
results indicate that the DFT processor with larger transconductance errors has smaller 
peak SNDR. On the plus side, the dynamic range of the DFT processor is not affected 
by the transconductance error. 
 
 
Figure 4-19: SNDR curves for various transconductance errors 
 
 
 
 62 
 
In deep submicron CMOS technologies transistor mismatches lead to significant DC 
offset [80]. Multipliers with various DC offsets are modelled by assuming that 𝑉𝑖𝑛,𝑜𝑠 has 
a normal distribution with zero mean and standard deviation  𝜎(𝑉𝑖𝑛,𝑜𝑠). Considering  
𝐺𝑚𝑜 = 200𝜇𝐴/𝑉  and 𝑎 = 0.2𝑉 , the impact of the DC offset mismatch on the 
performance of the DFT processor is analysed. Typical values of 𝜎(𝑉𝑖𝑛,𝑜𝑠) are obtained 
from a previous study on the analogue FFT processor [77]. Results of this analysis are 
illustrated in Figure 4.20. These results indicate that the DFT processor with larger DC 
offset mismatch is more susceptible to noise and distortion at low signal levels. 
Accordingly, the DFT processor with larger DC offset mismatch has smaller dynamic 
range and peak SNDR.  
 
 
Figure 4-20: SNDR curves for various DC offset mismatches 
 63 
 
4.4.3 Design Specifications of the Integrator 
Based on the Nyquist theorem, the sampling frequency (𝑓𝑆) of the SC integrator must be 
at least twice the signal frequency (𝑓𝑖𝑛). Considering  𝑓𝑆 𝑓𝑖𝑛⁄ = 4 , the result of the N-
point DFT is 
𝑉𝑜𝑢𝑡 = 4𝑔∑𝑉𝑖𝑛(𝑛)
𝑁
𝑛=1
 
 
(4.19) 
where 𝑉𝑖𝑛(𝑛) is the input voltage of the integrator at 𝑛
𝑡ℎ time interval, and 𝑔 is the gain 
of the integrator. Outputs of two multipliers are added together and the result is applied 
to the input of the integrator. Hence, 𝑉𝑖𝑛(𝑛) = 𝑉𝑂1(𝑛) + 𝑉𝑂2(𝑛), where 𝑉𝑂𝑖(𝑛) is the 
output of 𝑖𝑡ℎ multiplier at 𝑛𝑡ℎ time interval. To avoid the reduction of the SNDR due to 
the Op-amp saturation,  
 
𝑉𝑜𝑢𝑡 ≤ 𝑉𝐷𝐷 − 𝑉𝑖𝑛,𝐶𝑀 (4.20) 
where 𝑉𝐷𝐷 is the supply voltage of the Op-amp, and 𝑉𝑖𝑛,𝐶𝑀 is the input common-mode 
(CM) level of the Op-amp. In Figure 4.7, 𝑉𝑖𝑛,𝐶𝑀 is shown by the ground symbol. The 
input of the integrator (𝑉𝑖𝑛) is connected to the output of the multiplier. Hence, 𝑉𝑖𝑛,𝐶𝑀 
must be equal to the output CM level of the multiplier. Using a 1.8V voltage supply, the 
output CM level of the multiplier is 1.2V (section 5.2.3). Ideally, 𝑔 = 𝐶𝑆/𝐶𝐼  [78]. 
Substituting the aforementioned values in the equations (4.19) and (4.20) gives 
 
4
𝐶𝑆
𝐶𝐼
 ∑𝑉𝑂1(𝑛) + 𝑉𝑂2(𝑛)
𝑁
𝑛=1
≤ 0.6 
 
(4.21) 
 
 64 
 
The linear range of the multiplier is [−𝑎 , 𝑎]. Since the maximum gain of the multiplier 
is one, the maximum output of the multiplier is |𝑉𝑂,𝑚𝑎𝑥| = 𝑎. Assuming that  
 
𝑉𝑂1(𝑛) = 𝑉𝑂2(𝑛) = |𝑉𝑂,𝑚𝑎𝑥|         𝑓𝑜𝑟  𝑛 = 1,… ,𝑁  (4.22) 
For an 8-point DFT, 
64
𝐶𝑆
𝐶𝐼
 𝑎 ≤ 0.6 
 (4.23) 
The smallest capacitor that can hold the sampled voltage is 𝐶𝑆 = 50𝑓𝐹. In the previous 
section, 𝑎 is estimated to be 0.2V. Hence, 𝐶𝐼 = 1 𝑝𝐹 is selected. 
Ideally, Op-amp has infinite open-loop gain (𝐴𝑣). Hence, ideally the integrator leakage 
is 𝛼 = 1. In practice, however, 𝐴𝑣 < ∞. Thus, only a fraction of the previous output of 
the integrator is added to the new input sample. The consequence of this integrator 
leakage is that 𝛼 < 1. The precise value of 𝛼 is given by [78] 
 
𝛼 = 1 −
𝐶𝑆/𝐶𝐼
𝐴𝑣
 
(4.24) 
Considering 𝐶𝑆 = 50𝑓𝐹  and 𝐶𝐼 = 1 𝑝𝐹 , the impact of the integrator leakage on the 
performance of the DFT processor is analysed by varying 𝐴𝑣 in the behavioural system 
simulation (Figure 4.21). Results of this analysis indicate that Op-amp with larger 𝐴𝑣 
provides higher SNDR. In section 4.4.1, 90 𝜇𝑊 power was assigned to the integrator. 
Also, as mentioned earlier, output swing of the op-amp should be at least 0.6V. Hence, 
it is unlikely to achieve 𝐴𝑣 > 100𝑉/𝑉.    
 65 
 
 
Figure 4-21: SNDR curves for various op-amp gains 
 
Transistor mismatches lead to DC offset [80]. Integrators with various DC offsets are 
modelled by assuming that 𝑉𝑜𝑠 has a normal distribution with zero mean and standard 
deviation  𝜎(𝑉𝑜𝑠). Considering  𝐺𝑚𝑜 = 200𝜇𝐴/𝑉 and 𝑎 = 0.2𝑉, the impact of the DC 
offset mismatch on the performance of the DFT processor is analysed. Typical values of 
𝜎(𝑉𝑜𝑠) are obtained from a previous study [77]. Results of this analysis are shown in 
Figure 4.22. Based on these results, the DFT processor with larger DC offset mismatch 
has smaller dynamic range and peak SNDR. Accordingly, the DC offsets of the 
integrators have the same effect as the DC offsets of the multipliers.   
 66 
 
 
Figure 4-22: SNDR curves for various DC offset mismatches 
4.5 Yield Prediction 
Process variability is pivotal in submicron CMOS technologies. Variations in the 
physical properties of the transistors impact the performance of the designed system. 
Therefore, in advance of the expensive fabrication process, it is essential to predict the 
yield at various design stages using reliable statistical analysis [81]. The parametric 
yield is associated with the system performance metric 𝑋(𝑚), which is a function of the 
mismatch parameter 𝑚. Systems that succeed in meeting the requirement(s) for 𝑋(𝑚) 
contribute to the yield. The Monte Carlo method can be used to estimate the average 
result and yield. Figure 4.23 illustrates the distribution of 𝑋 that is estimated by the 
Monte Carlo method [82, 83].   
 
 67 
 
 
Figure 4-23: Yield prediction based on the Monte Carlo analysis [82] 
 
The Monte Carlo analysis should stop when adding new samples, no longer changes the 
sample mean by more than a certain threshold. In other words [84, 85],   
 
|?̅?𝑛 − ?̅?𝑤| ≤ 𝜀 (4.25) 
where, ?̅?𝑛 is the mean of all generated samples (sample mean), ?̅?𝑤 is the mean of last 𝑤 
samples (window mean), and 𝜀 is the tolerance for convergence.  
 
?̅?𝑛 = 
𝑋1 +⋯+ 𝑋𝑛
𝑛
 
(4.26) 
?̅?𝑤 = 
𝑋𝑛−𝑤+1 +⋯+ 𝑋𝑛
𝑤
 
(4.27) 
 68 
 
Sample variance is given by 
𝑆𝑛
2 = 
1
𝑛 − 1
 ∑  (𝑋𝑖 − ?̅?𝑛)
2
𝑛
𝑖=1
 
  
(4.28) 
Variance of the window is 
𝑆𝑤
2 = 
1
𝑤 − 1
 ∑  (𝑋𝑖 − ?̅?𝑤)
2
𝑤
𝑖=1
 
  
(4.29) 
Considering 𝜇  as the expected value of 𝑋 , the probability of the random variable 
√𝑛/𝑆𝑛(?̅?𝑛 − 𝜇) falling within the range [– 𝐴1, 𝐴1] is given by [84, 85] 
  
𝑃 { −𝐴1 ≤ 
√𝑛
𝑆𝑛
 (?̅?𝑛 − 𝜇)  ≤ 𝐴1 } 
(4.30) 
that is equivalent to  
𝑃 { ?̅?𝑛 −
𝑆𝑛𝐴1
√𝑛
 ≤  𝜇 ≤  ?̅?𝑛 +
𝑆𝑛𝐴1
√𝑛
  } 
(4.31) 
The value of 𝐴1 must be extracted from the statistical tables of the t-distribution such 
that [84, 85] 
𝑃 { 𝜇 ∈ [ ?̅?𝑛 −
𝑆𝑛𝐴1
√𝑛
 , ?̅?𝑛 +
𝑆𝑛𝐴1
√𝑛
  ]  } → 1 − 𝛿 
(4.32) 
where 𝛿 is the error probability. Thus, the random interval 
[ ?̅?𝑛 −
𝑆𝑛𝐴1
√𝑛
 , ?̅?𝑛 +
𝑆𝑛𝐴1
√𝑛
  ] 
(4.33) 
is a 100(1 − 𝛿) % confidence interval for 𝜇 [84, 85].  
 69 
 
Same confidence interval can be obtained from 
[ ?̅?𝑤 −
𝑆𝑤𝐴2
√𝑤
 , ?̅?𝑤 +
𝑆𝑤𝐴2
√𝑤
  ] 
(4.34) 
Thus,  
?̅?𝑛 +
𝑆𝑛𝐴1
√𝑛
  = ?̅?𝑤 +
𝑆𝑤𝐴2
√𝑤
   
(4.35) 
Accordingly,  
|?̅?𝑛 − ?̅?𝑤| = | 
𝑆𝑛𝐴1
√𝑛
−
𝑆𝑤𝐴2
√𝑤
 | 
(4.36) 
For 𝑤 = 10 and 𝛿 = 0.05, 𝐴2 = 2.228. For yield prediction, a large number of samples 
are required. Hence, 𝐴1 ≈ 1.96. Thereby, the tolerance is 
  
𝜀 = | 
1.96 𝑆𝑛
√𝑛
− 0.7𝑆𝑤 | 
(4.37) 
Since 𝐴1 = 1.96  is used in the above equation, at least 500 samples are required 
(𝑛𝑚𝑖𝑛 = 500) before checking the convergence (equation 4.25). 
 
 
 
 
 70 
 
4.6 Performance Analysis Results 
Behavioural models of a real-time recursive DFT processor and a radix-2 FFT processor 
of length 8 were described in section 4.3. The behavioural model of the radix-2 FFT 
processor is based on a previous study on the analogue FFT processor [77]. Initially, the 
model parameters of the FFT processor was set at the values provided in [77] to verify 
the accuracy of the Simulink model. 
In this section, Monte Carlo analysis is used to evaluate the performance of the 
aforementioned processors. For this purpose, OFDM signal with BPSK modulation is 
generated by Simulink. The MATLAB code of this analysis is available in Appendix A. 
Model parameters are set at the values in Table 4-2. Typical values of 𝐴𝑟 , 𝜎(𝑉𝑖𝑛,𝑜𝑠), 
𝜎(𝐺𝑚,𝑜𝑠), 𝜎(𝐴𝑟), 𝜎(𝑉𝑜𝑠), 𝛾, and 𝑁 are obtained from [77]. 
 
Table 4-2: Summary of the optimal value for the behavioural model parameters 
Parameter Value 
𝑎 0.2V 
𝑏 0.4V 
𝐼𝑚𝑎𝑥 80µA 
𝐺𝑚𝑜 200µA/V 
𝑅𝐷 5KΩ 
𝐴𝑟 10µA/V 
𝜎(𝑉𝑖𝑛,𝑜𝑠) 0.5mV 
𝜎(𝐺𝑚,𝑜𝑠) 2µA/V 
𝜎(𝐴𝑟) 10µA/V 
𝐴𝑣 100V/V 
𝜎(𝑉𝑜𝑠) 0.1mV 
𝛾 0 
𝑁 1 
 
 71 
 
Table 4-3 gives the results of the Monte Carlo analysis. These results indicate that the 
average dynamic range of the proposed architecture is 4.7dB higher than the FFT 
processor.  
Table 4-3: Summary of the Monte Carlo analysis for the recursive DFT and the radix-2 FFT processors 
 
The results of the Monte Carlo analysis are shown in Figure 4.24 and Figure 4.25. The 
histograms of the dynamic range for DFT and FFT processors are shown in Figure 4.26 
and Figure 4.27. Based on these histograms, the real-time recursive DFT processor has a 
yield of 99.3% while the yield of the FFT processor is 82.8%.  
 
 
Figure 4-24: Monte Carlo analysis results of the real-time recursive DFT processor 
(dB) Dynamic range Peak SNDR 
Recursive DFT Radix-2 FFT Recursive DFT Radix-2 FFT 
Mean 41.3 36.6 40.8 41.8 
Standard deviation 3.4 3.1 1.6 1.7 
 72 
 
 
Figure 4-25: Monte Carlo analysis results of the radix-2 FFT processor  
 
Figure 4-26: The dynamic range histogram of the real-time recursive DFT processor 
 73 
 
 
Figure 4-27: The dynamic range histogram of the radix-2 FFT processor 
4.7 Summary 
In this chapter, dynamic range requirements for an analogue DFT processor that 
supports WiFi and WiMAX standards were derived. Moreover, the behavioural models 
of the real-time recursive DFT processor and FFT processor were explained. Also, 
design specifications of an 8-point recursive DFT processor were determined. 
The results of the Monte Carlo analysis on system simulations indicate that the average 
dynamic range of the real-time recursive DFT processor is 4.7dB higher than the radix-2 
FFT processor. Moreover, the proposed architecture has a yield of 99.3% while the 
yield of the FFT processor is 82.8%. The enhanced performance of the real-time 
recursive DFT processor over the FFT processor convinced the designer to proceed to 
the transistor-level circuit designs, which will be presented in the next chapter.  
 74 
 
Chapter 5   
CIRCUIT DESIGN 
In this chapter, various design approaches for the building blocks of the real-time 
recursive DFT processor are reviewed to find the suitable circuits. A rigorous analysis 
on each of the selected circuits is performed to optimize the design. Design 
considerations that are applied to provide the optimum matching will be discussed in the 
next chapter. Circuits are designed using 180 nm TSMC technology. The Berkeley 
Short-Channel IGFET Model (BSIM3v3) from the University of California, Berkeley is 
used for device modelling. Circuit simulations are performed by the Eldo SPICE 
simulator from the Mentor Graphics. The SPICE process parameters are provided by the 
MOSIS [86].  
5.1 Previous Work on the Analogue FFT Processor 
In early attempts to implement the analogue Fourier transform, discrete circuits were 
used [65, 66]. In these designs, the Switched-Capacitor amplifier was used as the 
coefficient multiplier (Figure 5.1(a)). The clock signals that control the circuit are 
shown in Figure 5.1(b). In sampling mode, S1 and S2 are on and S3 is off (Figure 5.1(c)). 
Hence, the voltage across C1 tracks the input voltage. In the transition from the 
sampling mode to the amplification mode (Figure 5.1(d)), the channel charge injection 
leads to voltage error. This error is alleviated if S2 turns off slightly before S1 turns off 
and S3 turns on [80]. 
 75 
 
 
Figure 5-1: (a) Switched-Capacitor amplifier (b) timing diagram of circuit (a)  
(c) circuit (a) in sampling mode (d) circuit (a) in amplification mode [80] 
 
Accordingly, the output voltage is given by  
𝑉𝑜𝑢𝑡 = 
𝐶1
𝐶2
𝑉𝑖𝑛 
 (5.1) 
Thus, the voltage sample is multiplied by the capacitance ratio.  
In recent years, different design approaches have been taken to implement the FFT 
algorithm as an analogue integrated circuit. In one study, multiplication is performed by 
the current mirror (Figure 5.2) that generates a scaled copy of the reference current 
(𝐼𝑟𝑒𝑓) at its output [87] 
𝐼𝑜𝑢𝑡 = 
(𝑊/𝐿)2
(𝑊/𝐿)1
. 𝐼𝑟𝑒𝑓 
 (5.2) 
where (𝑊/𝐿)𝑥 is the width to length ratio of device 𝑀𝑥. 
 76 
 
 
Figure 5-2: The basic current mirror [80] 
In another study, the passive Switched-Capacitor (Figure 5.3) is used as the multiplier in 
order to minimize the power consumption [88]. In this approach, signal is multiplied by 
𝑚 = 𝐶1/(𝐶1 + 𝐶2) when charges are transferred from capacitor 𝐶1 to capacitor 𝐶2 [89]. 
Since all of the aforementioned approaches merely use the physical properties to 
perform the multiplication, their scaling factors are unchangeable. Thus, the 
aforementioned multipliers are not suitable for the variable-length DFT processor. 
 
 
Figure 5-3: The passive Switched-Capacitor multiplier 
In another attempt, a 64-point FFT processor was realized with the Switched-
Transconductor multiplier (Figure 5.4) [90, 91]. In this approach, differential pairs with 
various W/L ratios are connected together.  
 
 
 77 
 
 
Figure 5-4: The Switched-Transconductor multiplier  
As it will be proved in section 5.2.2, the differential current of the nth pair is  
 
∆𝐼𝐷𝑛 ∝ (
𝑊
𝐿
)
𝑛
∆𝑉𝑖𝑛 
 
 
(5.3) 
For any pair that is connected to the common voltage ∆𝑉𝑖𝑛 = 0; hence, ∆𝐼𝐷𝑛 = 0. The 
current that leaves each of the output nodes is equal to the sum of currents entering that 
node. Thereby, the differential output current is  
 
∆𝐼𝑜𝑢𝑡 ∝ [(
𝑊
𝐿
)
1
+⋯+ (
𝑊
𝐿
)
𝑛
] ∆𝑉𝑖𝑛 
 
 
(5.4) 
Accordingly, the scalar factor is adjusted by controlling the differential pairs that are 
connected to the input. The number of the scalar factors increases with the Fourier 
transform length. Thus, each multiplier requires more differential pairs as the Fourier 
transform length increases. Accordingly, this approach is not area efficient.   
A reconfigurable DFT processor that is implemented on a Field Programmable 
Analogue Array (FPAA) was proposed in [92]. The reconfigurable multipliers of this 
processor are realized by the floating-gate transistors (Figure 5.5).  
 78 
 
 
Figure 5-5: The floating-gate multiplier 
In this approach, both of the PMOS transistors operate in the subthreshold region. 
Hence, the input and output currents are given by [92] 
 
𝐼𝑖𝑛 = 𝐼𝑜
𝑊
𝐿
𝑒𝑥𝑝 (
𝑉𝑆 − 𝜅𝑉𝐺1
𝑉𝑇
) [1 − 𝑒𝑥𝑝 (
𝑉𝐷𝑆
𝑉𝑇
)] 
 
 
(5.5) 
 
𝐼𝑜𝑢𝑡 = 𝐼𝑜
𝑊
𝐿
𝑒𝑥𝑝 (
𝑉𝑆 − 𝜅𝑉𝐺2
𝑉𝑇
) [1 − 𝑒𝑥𝑝 (
𝑉𝐷𝑆
𝑉𝑇
)] 
 
 
(5.6) 
where 𝐼𝑜 is a process-dependant constant, 𝑉𝑇 is the thermal voltage, and 𝜅 denotes the 
gate coupling coefficient which is  
 
𝜅 =
𝐶𝑔
𝐶𝑇
(
𝐶𝑜𝑥
𝐶𝑜𝑥 + 𝐶𝑑𝑒𝑝
) 
 
 
(5.7) 
In the above equation, 𝐶𝑜𝑥 is the gate oxide capacitance, 𝐶𝑑𝑒𝑝 is the depletion region 
capacitance, and 𝐶𝑇  is the total capacitance at the gate (i.e. 𝐶𝑔  and the internal 
capacitances of the transistor).  
 
 79 
 
The voltages across the gate capacitors are 
 
𝑉𝐺1 − 𝑉𝐹 =
𝑄1
𝐶𝑔
 
 
 
(5.8) 
 
𝑉𝐺2 − 𝑉𝐹 =
𝑄2
𝐶𝑔
 
 
 
(5.9) 
Substituting equations (5.8) and (5.9) in (5.5) and (5.6) gives 
 
𝐼𝑖𝑛 = 𝐼𝑜
𝑊
𝐿
𝑒𝑥𝑝 (
𝑉𝑆 − 𝜅𝑉𝐹
𝑉𝑇
) 𝑒𝑥𝑝 (
−𝜅𝑄1
𝐶𝑔𝑉𝑇
) [1 − 𝑒𝑥𝑝 (
𝑉𝐷𝑆
𝑉𝑇
)] 
 
 
(5.10) 
 
𝐼𝑜𝑢𝑡 = 𝐼𝑜
𝑊
𝐿
𝑒𝑥𝑝 (
𝑉𝑆 − 𝜅𝑉𝐹
𝑉𝑇
) 𝑒𝑥𝑝 (
−𝜅𝑄2
𝐶𝑔𝑉𝑇
) [1 − 𝑒𝑥𝑝 (
𝑉𝐷𝑆
𝑉𝑇
)] 
 
 
(5.11) 
Accordingly, 
 
𝐼𝑜𝑢𝑡 = 𝑒𝑥𝑝(
𝜅(𝑄1 − 𝑄2)
𝐶𝑔𝑉𝑇
) . 𝐼𝑖𝑛 
 
 
(5.12) 
which means that the scalar factor can be adjusted by controlling the amount of charges 
that are stored in the 𝐶𝑔 capacitors. These equations are valid if the transistors are biased 
in the subthreshold region. To maintain this condition, a comparator is used to adjust VS 
to the changes of 𝑄1 and 𝑄2. Structure of the FPAA demands to use 16N 
2 multipliers 
for an N-length DFT [93]. Thus, the FPAA approach is not area efficient, and becomes 
unfeasible as N increases. 
 
 
 80 
 
In chapter 3, it was explained that in the proposed architecture the OFDM signal is 
multiplied by a piecewise continuous signal to eliminate sampling. All of the circuits 
that have been explained so far (Figures 5.1 to 5.5) actually scale the signal rather than 
multiplying two signals together. Moreover, adjusting the physical properties of each 
multiplier to provide various scaling factors makes the design process cumbersome. 
This problem becomes more severe as the transform length increases. Hence, it is 
essential to provide coefficients by signals rather than physical properties of the circuit. 
In view of that, a FFT processor was designed using a four-quadrant multiplier [41]. 
Although in this FFT processor discrete-time signals are multiplied together, four-
quadrant multipliers can also be used for continuous signals. Analysis of the four-
quadrant multiplier is provided in the next section.    
5.2 Analogue Multiplier 
 Analogue multipliers provide the linear product of two input signals 𝑥 and 𝑦, yielding 
output signal 𝑧 = 𝐾𝑥𝑦 where 𝐾 is the multiplication constant. Multipliers are classified 
into three main categories based on the signals’ polarity. These categorise are single-
quadrant (where 𝑥 and 𝑦 are unipolar), two-quadrant (where 𝑥 or 𝑦 is bipolar), and four-
quadrant (where 𝑥 and 𝑦 are bipolar). Modulators and mixers are particular cases of 
multipliers that are used in communication systems. Despite the large number of 
multipliers that are reported in the literature, they can be classified into a few categories 
based on their architectures [94]. Design specifications, such as bandwidth and power 
budget, determine the suitable circuit topology. The design of a suitable analogue 
multiplier for the real-time recursive DFT processor is discussed in this section.  
 
 81 
 
5.2.1 Principle of Operation 
The basic operation of an analogue multiplier is to generate a high order polynomial of 
the two signals using nonlinear devices, and then cancel all terms other than 𝐾𝑥𝑦. Since 
MOS transistors have square-law characteristics, they can be used for this purpose. For 
MOS transistors in the saturation region, overdrive voltage is a second order polynomial 
[94]. 
 
(𝑉𝐺𝑆 − 𝑉𝑇𝐻)
2 = 
𝐼𝐷
1
2 𝜇𝐶𝑜𝑥
𝑊
𝐿
 
 
(5.13) 
Here, 𝑉𝐺𝑆 is the gate-source voltage, 𝑉𝑇𝐻 is the threshold voltage, 𝐼𝐷 is the drain current, 
𝜇 is the mobility of charge carriers, 𝐶𝑜𝑥 is the gate oxide capacitance per unit area, 𝑊 is 
the width and 𝐿 is the length of the channel. Since the overdrive polynomial is achieved 
by the drain current, the analogue multiplier function can be realized by 
transconductance amplifiers. Later, a transresistor can be used to convert the output 
current to voltage. The simplest topology of analogue multiplier is a differential pair 
with a variable current source that is controlled by one of the input signals (Figure 5.6) 
[95, 96].   
 
Figure 5-6: Two-quadrant analogue multiplier [96] 
 82 
 
Multipliers of the DFT processor downconvert sub-channels of the OFDM signal to 
zero frequency; hence, they act as zero Intermediate Frequency (IF) mixers. 
Accordingly, the multiplier that is shown in Figure 5.6 is a single-balanced active mixer. 
Since each multiplier considers one of the sub-channels as its desired signal, other sub-
channels act as interferers that accompany the desired signal. An ideal differential pair 
cancels input feedthroughs. However, mismatch between the differential pair allows a 
fraction of the input to appear at the output without frequency translation. Hence, zero 
IF mixers are sensitive to even-order distortion. This problem can be resolved by raising 
the Second Intercept Point (IP2) of the multiplier. For this purpose, input of the 
transconductor stage must be realized in differential form, leading to a double-balanced 
topology [96]. The Gilbert cell is a precision four-quadrant multiplier that is widely 
used as a double-balanced mixer in communication systems [97]. Hence, the Gilbert cell 
is considered as a suitable multiplier for the real-time recursive DFT processor. 
5.2.2 Analysis of the CMOS Gilbert Cell   
Initially, the Gilbert cell was realized based on the exponential characteristics of Bipolar 
Junction Transistors (BJT) [97]. Nevertheless, the same topology can be used for MOS 
transistors with square-low characteristics [98]. A block diagram of the Gilbert cell is 
shown in Figure 5.7.  
 
Figure 5-7: Block diagram of the Gilbert cell 
 83 
 
Each Gm transconductor is realized by a differential pair (Figure 5.8). 
 
Figure 5-8: Gm transconductor [80] 
The outputs of the differential pair in Figure 5.8 are given by [80] 
 
𝑉𝑜𝑢𝑡1 = 𝑉𝐷𝐷 − 𝑅𝐷1𝐼𝐷1 
 
(5.14) 
 
𝑉𝑜𝑢𝑡2 = 𝑉𝐷𝐷 − 𝑅𝐷2𝐼𝐷2 
 
(5.15) 
If  𝑅𝐷1 = 𝑅𝐷2 = 𝑅𝐷 , then 
 
𝑉𝑜𝑢𝑡1 − 𝑉𝑜𝑢𝑡2 = 𝑅𝐷2𝐼𝐷2 − 𝑅𝐷1𝐼𝐷1 = 𝑅𝐷(𝐼𝐷2 − 𝐼𝐷1) 
 
(5.16) 
Voltage at node P is  
 
𝑉𝑝 = 𝑉𝑖𝑛1 − 𝑉𝐺𝑆1 = 𝑉𝑖𝑛2 − 𝑉𝐺𝑆2 
 
(5.17) 
Thus, 
 
𝑉𝑖𝑛1 − 𝑉𝑖𝑛2 = 𝑉𝐺𝑆1 − 𝑉𝐺𝑆2 
 
(5.18) 
 84 
 
For an ideal saturated NMOS device, we have   
 
(𝑉𝐺𝑆 − 𝑉𝑇𝐻)
2 = 
𝐼𝐷
1
2 𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 
 
(5.19) 
Hence, 
 
𝑉𝐺𝑆 = √
2𝐼𝐷
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
+ 𝑉𝑇𝐻 
 
 
(5.20) 
Combining (5.18) and (5.20) yields 
 
𝑉𝑖𝑛1 − 𝑉𝑖𝑛2 = √
2𝐼𝐷1
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
−√
2𝐼𝐷2
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 
 
(5.21) 
The objective is to attain the differential output current 𝐼𝐷1 − 𝐼𝐷2  . Therefore, by 
squaring both sides of (5.21) and considering that 𝐼𝐷1 + 𝐼𝐷2 = 𝐼𝑆𝑆  we obtain 
 
(𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)
2 = 
2
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
(𝐼𝑆𝑆 − 2√𝐼𝐷1𝐼𝐷2) 
 
 
(5.22) 
Rearranging (5.22) gives 
 
1
2
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
(𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)
2 − 𝐼𝑆𝑆 = −2√𝐼𝐷1𝐼𝐷2 
 
(5.23) 
 
 
 85 
 
Squaring both sides again and considering that 4𝐼𝐷1𝐼𝐷2 = (𝐼𝐷1 + 𝐼𝐷2)
2 − (𝐼𝐷1− 𝐼𝐷2)
2 =
𝐼𝑆𝑆
2 − (𝐼𝐷1− 𝐼𝐷2)
2 , we achieve  
 
(𝐼𝐷1− 𝐼𝐷2)
2 = −
1
4
(𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
)
2
(𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)
4 + 𝐼𝑆𝑆𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
(𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)
2 
 
 (5.24) 
Thereby [80], 
 
𝐼𝐷1− 𝐼𝐷2 = 
1
2
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
(𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)√
4𝐼𝑆𝑆
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 − (𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)2 
 
 
(5.25) 
 
In order to find 𝐼𝐷1  and 𝐼𝐷2  , 𝐼𝐷2 = 𝐼𝑆𝑆 − 𝐼𝐷1  and 𝐼𝐷1 = 𝐼𝑆𝑆 − 𝐼𝐷2  are substituted in 
(5.25) respectively.  
 
𝐼𝐷1 = 
𝐼𝑆𝑆
2
+
1
4
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
(𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)√
4𝐼𝑆𝑆
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 − (𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)2 
 
 
(5.26) 
 
𝐼𝐷2 =
𝐼𝑆𝑆
2
− 
1
4
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
(𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)√
4𝐼𝑆𝑆
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 − (𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)2 
 
 
(5.27) 
 
The circuit topology of the Gilbert cell is obtained by replacing the Gm blocks in Figure 
5.7 with differential pairs. 
 86 
 
 
Figure 5-9: Gilbert cell 
 
The differential output current of the Gilbert cell (Figure 5.9) is  
 
𝐼𝑜𝑢𝑡− − 𝐼𝑜𝑢𝑡+ = (𝐼𝐷1 + 𝐼𝐷3) − (𝐼𝐷2 + 𝐼𝐷4) = (𝐼𝐷1 − 𝐼𝐷2) − (𝐼𝐷4 − 𝐼𝐷3) 
 
(5.28) 
Here, 𝐼𝐷1 − 𝐼𝐷2 and 𝐼𝐷4 − 𝐼𝐷3 are the differential currents of the two pairs with V2 input. 
These differential currents can be calculated from equation (5.25). The tail current 
sources of the two pairs with V2 input are 𝐼𝐷5 and 𝐼𝐷6. Hence, denoting 𝑉2+ − 𝑉2− and 
𝐼𝑜𝑢𝑡− − 𝐼𝑜𝑢𝑡+ by ∆𝑉2 and ∆𝐼𝑜𝑢𝑡 , respectively, we have 
 
∆𝐼𝑜𝑢𝑡 =
1
2
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
∆𝑉2(√
4𝐼𝐷5
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 − ∆𝑉2
2 −√
4𝐼𝐷6
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 − ∆𝑉2
2) 
 
(5.29) 
 87 
 
Equations (5.26) and (5.27) can be used for 𝐼𝐷5 and 𝐼𝐷6 . Since square roots of 𝐼𝐷5 and 
𝐼𝐷6 are taken in (5.29), equations (5.26) and (5.27) must be written in square form for 
simplification. For this purpose, the auxiliary term ( 𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
(𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)
2/8) should 
be added to and subtracted from (5.26) and (5.27). Thereby [99], 
 
𝐼𝐷1 = 
1
4
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
(
 
 
√
2𝐼𝑆𝑆
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 − 
  (𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)2
2
 + 
(𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)
√2
)
 
 
2
 
 
 
 
(5.30) 
 
𝐼𝐷2 = 
1
4
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
(
 
 
√
2𝐼𝑆𝑆
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 − 
  (𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)2
2
 − 
(𝑉𝑖𝑛1 − 𝑉𝑖𝑛2)
√2
)
 
 
2
 
 
 
 
(5.31) 
 
Now, by substituting (5.30) and (5.31) for 𝐼𝐷5 and 𝐼𝐷6 in (5.29) we achieve  
 
∆𝐼𝑜𝑢𝑡 =
1
2
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
∆𝑉2
(
 
 
  √
(
 √
2𝐼𝑆𝑆
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 − 
  ∆𝑉1
2
2
 + 
∆𝑉1
√2
)
 
2
 − ∆𝑉2
2
−√
(
 √
2𝐼𝑆𝑆
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 − 
  ∆𝑉1
2
2
− 
∆𝑉1
√2
)
 
2
 − ∆𝑉2
2  
)
 
 
 
 
 
 
 
 
(5.32) 
 
 
 88 
 
Equation (5.32) can be approximated by 
 
∆𝐼𝑜𝑢𝑡 ≅ 
1
2
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
∆𝑉2
(
 
 
  √
(
 √
2𝐼𝑆𝑆
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 − 
  ∆𝑉1
2
2
 + 
∆𝑉1
√2
)
 
2
−√
(
 √
2𝐼𝑆𝑆
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 − 
  ∆𝑉1
2
2
− 
∆𝑉1
√2
)
 
2
  
)
 
 
  
 
 
 
 
 
 
 
(5.33) 
Thereby [99], 
 
∆𝐼𝑜𝑢𝑡 ≅ 
1
√2
 𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
∆𝑉1∆𝑉2 
 
(5.34) 
Equation (5.25) was derived with the assumption that both 𝑀1 and 𝑀2 are on. In reality 
however, as ∆𝑉𝑖𝑛 exceeds a limit, only one transistor is on and carries the entire 𝐼𝑆𝑆. 
Denoting this limit by ∆𝑉𝑖𝑛1 and assuming that 𝑀1 is on, 𝐼𝐷1 = 𝐼𝑆𝑆 and ∆𝑉𝑖𝑛1 = 𝑉𝐺𝑆1 −
𝑉𝑇𝐻 should be substituted in equation (5.19). Thereby [80], 
 
∆𝑉𝑖𝑛1 = √
2𝐼𝑆𝑆
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 
 
 
(5.35) 
Hence, equations (5.25) to (5.34) are valid for input range −∆𝑉𝑖𝑛1 < ∆𝑉𝑖𝑛 < ∆𝑉𝑖𝑛1 . 
Considering the characteristic of a differential pair that is shown in Figure 5.10, 
[−∆𝑉𝑖𝑛1 , ∆𝑉𝑖𝑛1] is the linear range of operation. Based on the equation (5.35), the linear 
range can be increased by increasing 𝐼𝑆𝑆 or decreasing 𝑊/𝐿. Increasing 𝐼𝑆𝑆 , increases 
the power consumption. On the other hand, as will be explained in the next chapter, 
devices with smaller 𝑊/𝐿 provide better matching.  
 89 
 
 
Figure 5-10: Input-output characteristic of a differential pair [80] 
 
The transconductance of the differential pair is the slope of the characteristic (Figure 
5.10). Thus, 𝐺𝑚 of the differential pair is obtained by taking the derivative of equation 
(5.25) [80]. 
𝐺𝑚 =
𝜕∆𝐼𝐷
𝜕∆𝑉𝑖𝑛
= 
1
2
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 
4𝐼𝑆𝑆
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 − 2∆𝑉𝑖𝑛
2
√
4𝐼𝑆𝑆
𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
 − ∆𝑉𝑖𝑛
2
 
 
 
(5.36) 
In the equilibrium condition ∆𝑉𝑖𝑛 = 0  ; thus, 𝐺𝑚 = √𝜇𝑛𝐶𝑜𝑥(𝑊/𝐿)𝐼𝑆𝑆  . Substituting 
∆𝐼𝐷 = 𝐺𝑚∆𝑉𝑖𝑛 in the equation (5.16) gives [80] 
 
∆𝑉𝑜𝑢𝑡 = 𝑅𝐷∆𝐼𝐷 = 𝑅𝐷𝐺𝑚∆𝑉𝑖𝑛 
 
(5.37) 
 
 90 
 
Thus, the small-signal voltage gain of the differential pair in the equilibrium condition is 
[80]   
|𝐴𝑣| =
∆𝑉𝑜𝑢𝑡
∆𝑉𝑖𝑛
= √𝜇𝑛𝐶𝑜𝑥
𝑊
𝐿
𝐼𝑆𝑆  𝑅𝐷 
 
 (5.38) 
Accordingly, reducing 𝑊/𝐿 to make the circuit more linear inevitably decreases the 
transconductance and voltage gain. Linearization techniques can be applied to increase 
the linear range further. The simplest linearization technique is resistive source 
degeneration. Inductive and capacitive degeneration also increase the linearity. 
Inductive degeneration has low noise and provides higher linearity comparing to the 
resistive and capacitive degeneration [100, 101]. However, inductors require a large 
layout area. Several other methods were proposed to improve the linearity of the Gilbert 
cell [99, 102-105]. However, these methods increase the complexity and power 
consumption of the multiplier. Since the DFT processor with OFDM application 
requires a large number of multipliers, the simplest linearization technique is preferable.   
5.2.3 Circuit Realization 
It is difficult to fabricate resistors with accurate values or a reasonable physical size in 
CMOS technologies. Thus, degeneration resistors can be replaced by transistors that 
operate in the deep triode region. Moreover, RD resistors can be replaced by diode-
connected transistors [80]. Therefore, the circuit is modified as depicted in Figure 5.11. 
In this topology, 𝑀1 −𝑀6 and 𝑀9 −𝑀11 operate in the saturation region. Besides, 𝑀7 
and 𝑀8 operate in the deep triode region to perform the resistive degeneration. The gain 
of a circuit with diode-connected load is 𝐴𝑣 ∝ 𝜇𝑖𝑛𝑝𝑢𝑡 𝑑𝑒𝑣𝑖𝑐𝑒/𝜇𝐿𝑜𝑎𝑑 𝑑𝑒𝑣𝑖𝑐𝑒, where 𝜇 is the 
mobility of charge carriers. Accordingly, higher gain can be achieved by using PMOS 
devices with lower mobility of carriers (i.e. in modern processes 𝜇𝑝𝐶𝑜𝑥 ≈ 0.25𝜇𝑛𝐶𝑜𝑥 ) 
as load.  
 91 
 
 
Figure 5-11: Degenerated Gilbert cell with diode-connected load 
The output common-mode (CM) level of the Gilbert cell with diode-connected loads is 
𝑉𝐷𝐷 − 𝑉𝐺𝑆10, where 𝑉𝐺𝑆10 is the gate-source voltage of 𝑀10. 𝑀10 and 𝑀11 are always in 
saturation because the drain and the gate have the same potential. Hence, the CM level 
of the Gilbert cell with diode-connected loads is well-defined. The voltage gain of the 
Gilbert cell with diode-connected loads is 
𝐴𝑣 = −𝐺𝑚 (𝑅𝑂||𝑟𝑂10||
1
𝑔𝑚10
) ≈
−𝐺𝑚
𝑔𝑚10
 
 (5.39) 
where 𝐺𝑚 is the transconductance of the Gilbert cell, 𝑅𝑂 is the output resistance of the 
transconductor, 𝑟𝑂10 is the output resistance of 𝑀10, and 𝑔𝑚10 is the transconductance 
of 𝑀10. Hence, the Gilbert cell with diode-connected loads has a low voltage gain. To 
increase the voltage gain, 𝑀10  and 𝑀11  must operate as current sources for the 
differential signals. Since 𝐼𝐷1 + 𝐼𝐷3 = 𝐼𝐷2 + 𝐼𝐷4 = 𝐼𝑆𝑆/2  , the CM level depends on 
how close 𝐼𝐷10 and 𝐼𝐷11 are to this value.  
 92 
 
In practice, mismatches in the NMOS current source (𝑀9) and PMOS current sources 
(𝑀10 and 𝑀11) create an error between  𝐼𝐷10,11 and 𝐼𝑆𝑆/2. Thus, in the Gilbert cell with 
current-source loads, the difference between the currents that are generated by p-type 
and n-type current sources flow through the output impedance. Hence, the mismatch 
between the p-type and n-type current sources creates the voltage error of ((𝐼𝐷10 +
𝐼𝐷11) − 𝐼𝑆𝑆)(𝑅𝑂||𝑟𝑂10) at the output. Therefore, the output CM level of the Gilbert cell 
with current-source loads is sensitive to device properties and mismatches. Hence, a 
common-mode feedback (CMFB) network is required to sense the CM level of 𝑉𝑜𝑢𝑡+ 
and 𝑉𝑜𝑢𝑡−  and adjust one of the bias currents accordingly [80, 101]. Therefore, the 
circuit is modified as depicted in Figure 5.12, where 𝑀12 and 𝑀13 operate in the deep 
triode region. For differential changes at 𝑉𝑜𝑢𝑡+ and 𝑉𝑜𝑢𝑡− , node P is a virtual ground. 
Hence, the voltage gain is 
 
𝐴𝑣 = −𝐺𝑚(𝑅𝑂||𝑟𝑂10||𝑅𝑜𝑛12) 
 
 (5.40) 
where 𝑅𝑜𝑛12 is the on-resistance of 𝑀12. For CM levels, 𝑀10 and 𝑀11operate as diode-
connected loads. 
 93 
 
 
Figure 5-12: Degenerated Gilbert cell with CMFB network 
 
As explained in chapter 2, complex multiplication is performed by adding the results of 
two real multiplications. Based on KCL, currents that are entering the same node are 
added together. Thus, addition is provided by connecting the outputs of two multipliers 
to each other and sharing the load (transresistor) between the multipliers. The topology 
of the complex multiplier is shown in Figure 5.13.  
 94 
 
 
Figure 5-13: topology of the complex multiplier  
 
Based on the design specifications in section 4.4.2, the linear range of the multiplier 
should be at least [-0.2V, 0.2V]. Additionally, gain of the complex multiplier should be 
𝐴𝑣 = 1𝑉/𝑉. Therefore, each of the output nodes in Figure 5.13 must be able to swing 
by 0.2V without driving 𝑀10  , and 𝑀11  into the triode region. Thus, the overdrive 
voltage of  𝑀10  , and 𝑀11  should be |𝑉𝑂𝐷10| = 0.2𝑉 . As mentioned earlier, for CM 
levels, 𝑀10  and 𝑀11operate as diode-connected loads. Hence, with |𝑉𝑇𝐻| = 0.4𝑉  for 
PMOS transistors and |𝑉𝑂𝐷10| = 0.2𝑉 , the drain-source voltage of  𝑀10 , and 𝑀11  is 
|𝑉𝐷𝑆10| = 0.6𝑉. Thus, considering 𝑉𝐷𝐷 = 1.8𝑉 , the output CM level is 𝑉𝑜𝑢𝑡 = 1.2𝑉. 
Accordingly, the total voltage available for NMOS transistors is 1.2V. Based on the 
design specifications in section 4.4.2, the input swing should be [-0.4V,0.4V]. 
Therefore, 𝑉𝐷𝑆1 = 0.4𝑉 is allocated to 𝑀1𝑎 −𝑀6𝑎 and 𝑀1𝑏 −𝑀6𝑏. From the remaining 
voltage, 𝑉𝐷𝑆9 = 0.3𝑉  is allocated to the current supplies (𝑀9𝑎  and 𝑀9𝑏 ) and 𝑉𝐷𝑆7 =
0.1𝑉 is allocated to the degeneration transistors (𝑀7𝑎 , 𝑀8𝑎 , 𝑀7𝑏 , 𝑀8𝑏). Considering 
the linear range, 𝑉𝐷𝑆1 − 𝑉𝑂𝐷1 > 0.2𝑉  is required for 𝑀1𝑎 −𝑀6𝑎  and 𝑀1𝑏 −𝑀6𝑏 . 
Hence, 𝑉𝑂𝐷1 = 0.1𝑉 is allocated to 𝑀1𝑎 −𝑀6𝑎 and 𝑀1𝑏 −𝑀6𝑏. 
 
 95 
 
The length of the current source transistors (𝑀9𝑎  and 𝑀9𝑏 ) must be larger than the 
minimum length to reduce the channel-length modulation effect. Based on the equation 
(5.19), increasing the length reduces the supply current. Thus, either the width or the 
overdrive voltage must increase to provide the required current. Since increasing width 
and length together is not an area efficient solution, the overdrive voltage is selected to 
be 𝑉𝑂𝐷9 = 0.2𝑉. 
Considering the power budget (section 4.4.1), 𝐼𝑀𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑒𝑟 = 80 𝜇𝐴 is allocated to each of 
the current sources (𝑀9𝑎  and 𝑀9𝑏 ). Thus, each of the transistors in the bottom 
differential pairs (𝑀5𝑎 −𝑀8𝑎  and 𝑀5𝑏 −𝑀8𝑏 ) carries a current of 40 𝜇𝐴. Therefore, 
each of the transistors in the top differential pairs (𝑀1𝑎 −𝑀4𝑎 and 𝑀1𝑏 −𝑀4𝑏) carries a 
current of 20 𝜇𝐴.  
With the bias current and overdrive voltage of each transistor known, the aspect ratios 
of the transistors in the saturation region can be determined from 
 
𝐼𝐷 =
1
2
𝜇𝐶𝑜𝑥
𝑊
𝐿
𝑉𝑂𝐷
2  
 
 (5.41) 
Also, the aspect ratios of the transistors in the triode region can be determined from 
 
𝑅𝑜𝑛 =
1
𝜇𝐶𝑜𝑥
𝑊
𝐿 𝑉𝑂𝐷
 
 
 (5.42) 
To minimize the device capacitances, the minimum length (0.2 𝜇𝑚) was chosen for all 
transistors except 𝑀9𝑎 and 𝑀9𝑏. Table 5-1 shows the aspect ratios of the initial design 
which satisfies the swing and power budget specifications.  
 
 96 
 
Table 5-1: initial aspect ratios of the complex multiplier  
Transistor 𝑀1𝑎 −𝑀4𝑎 
𝑀1𝑏 −𝑀4𝑏 
𝑀5𝑎 , 𝑀6𝑎 
𝑀5𝑏 , 𝑀6𝑏 
𝑀7𝑎 , 𝑀8𝑎 
𝑀7𝑏 , 𝑀8𝑏 
𝑀9𝑎 
𝑀9𝑏 
𝑀10𝑎 , 𝑀11𝑎 
𝑀10𝑏 , 𝑀11𝑏 
𝑀12𝑎 ,𝑀13𝑎 
𝑀12𝑏 ,𝑀13𝑏 
𝑊
𝐿
 (
𝜇𝑚
𝜇𝑚
) 
3
0.2
 
6
0.2
 
3
1.6
 
10
0.8
 
12
0.2
 
4
0.8
 
 
As it will be explained in section 6.2, devices with larger channel area (𝑊𝐿) provide 
better matching. In order to maintain a constant overdrive voltage, width and length of 
each transistor must scale together. Since 𝑀1𝑎 −𝑀8𝑎  and 𝑀1𝑏 −𝑀8𝑏  appear in the 
signals paths, their maximum lengths are determined by the bandwidth requirement 
(BW > 20MHz). Table 5-2 shows the final aspect ratios.  
 
Table 5-2: final aspect ratios of the complex multiplier 
Transistor 𝑀1𝑎 −𝑀4𝑎 
𝑀1𝑏 −𝑀4𝑏 
𝑀5𝑎 , 𝑀6𝑎 
𝑀5𝑏 , 𝑀6𝑏 
𝑀7𝑎 , 𝑀8𝑎 
𝑀7𝑏 , 𝑀8𝑏 
𝑀9𝑎 
𝑀9𝑏 
𝑀10𝑎 , 𝑀11𝑎 
𝑀10𝑏 , 𝑀11𝑏 
𝑀12𝑎 ,𝑀13𝑎 
𝑀12𝑏 ,𝑀13𝑏 
𝑊
𝐿
 (
𝜇𝑚
𝜇𝑚
) 
15
1
 
30
1
 
3
1.6
 
50
4
 
60
1
 
4
0.8
 
 
The maximum output swing is achieved by choosing the CM levels 𝑉1 = 𝑉3 = 1𝑉 , 
𝑉2 = 𝑉4 = 1.5𝑉  , and the bias voltage 𝑉𝑏𝑖𝑎𝑠 = 0.6𝑉 . Thereby, 𝑀9𝑎  and 𝑀9𝑏  each 
provide 66𝜇𝐴. The power consumption of the complex multiplier is 239𝜇𝑊.  
Figure 5.14 shows the transfer characteristics of the designed circuit for various 
multiplication coefficients. Variations of the differential input (∆𝑉1) and differential 
output (∆𝑉𝑜𝑢𝑡)  signals are given on the horizontal and vertical axes respectively. 
Changing the multiplication coefficient (∆𝑉2)  changes the slope of the transfer 
characteristic. According to these characteristics, the designed circuit has a linear 
multiplication range of [-0.3V, 0.3V].  
  
 97 
 
 
Figure 5-14: Transfer characteristic of the Gilbert cell multiplier simulated in SPICE 
5.3 Discrete-Time Integrator 
The signal at the output of the multiplier is piecewise continuous. For an N-point DFT, 
the amplitude of N pieces must be summed together. Accordingly, a discrete-time 
integrator that takes samples of each piece and provides their sum is required. The 
discrete-time integrator was first realized by replacing the resistor in the Operational 
amplifier (Op-amp) integrator (i.e. continuous-time integrator) with a capacitor and two 
MOS switches (Figure 5.15) [106]. The nonoverlapping complementary clock signals 
(𝜑1 and 𝜑2) that control the circuit are shown in Figure 5.15(c). When the clock signal 
is high, the transistor is in the triode region. Thus, the transistor operates as a resistor 
and conducts current. Therefore, the switch turns on as the clock signal goes high. In the 
sampling mode S1 is on and 𝐶𝑆 absorbs a charge equal to 𝐶𝑆𝑉𝑖𝑛. In the integration mode 
S2 is on and 𝐶𝑆 deposits its charge on 𝐶𝐼.   
 98 
 
 
 
Figure 5-15: (a) continuous-time integrator (b) discrete-time integrator (c) timing diagram of circuit (b) 
[80] 
 
The equivalent capacitance of the parasitic capacitors that affect the output is 
represented by the 𝐶𝑝. Sensitivity of the output to 𝐶𝑝 can be reduced by enlarging the 
sampling and integrating capacitors. Therefore, this integrator demands large layout 
area which is unsuitable for Very Large Scale Integration (VLSI) circuits. The design of 
a switched-capacitor (SC) integrator that is insensitive to the parasitic capacitors is 
discussed in the following sections. 
5.3.1 Analysis of the Parasitic-Insensitive Integrator     
A parasitic-insensitive switched-capacitor (SC) integrator [106, 107] is shown in Figure 
5.16(a). In the sampling mode (Figure 5.16(b)), S1 and S3 are on and S2 and S4 are off. 
Thereby, the sampling capacitor (𝐶𝑆)  absorbs a charge equal to 𝐶𝑆𝑉𝑖𝑛  while the 
integrating capacitor (𝐶𝐼) holds the previous value. The channel charge injection in the 
transition from the sampling mode to the integration mode (Figure 5.16(c)) can be 
alleviated by proper switch timing. To this end, S3 turns off first, then S1 turns off, and 
finally S2 and S4 turn on (Figure 5.17). Since the output voltage is measured after node 
 99 
 
P is connected to ground, the final value of Vp is fixed (zero). Thus, the charge injection 
or absorption of S1 and S2 does not affect the output voltage. Moreover, the voltage 
across the junction capacitance of S3 and S4 changes from near zero in the sampling 
mode to virtual ground in the integration mode. Since this voltage variation is very 
small, the charge stored on the junction capacitance is negligible. Consequently, only a 
constant charge from S3 is injected onto 𝐶𝑆 which introduces a constant offset at the 
output. This offset is suppressed by the differential operation [80]. The parasitic 
capacitor 𝐶𝑝1 is periodically switched between the input and ground. Also, the parasitic 
capacitor 𝐶𝑝2 is periodically switched between the virtual ground and ground. Hence, 
𝐶𝑝1 and 𝐶𝑝2 do not deliver any charge to 𝐶𝐼. Therefore, the output voltage is insensitive 
to the parasitic capacitors. Accordingly, there is no need to alleviate the effect of 
parasitic capacitors by enlarging 𝐶𝑆 and 𝐶𝐼. Thus, the parasitic-insensitive SC integrator 
is area efficient [108, 109]. 
 
Figure 5-16: (a) Parasitic-insensitive integrator (b) circuit of (a) in sampling mode, (c) circuit of (a) in 
integration mode [80] 
 
 100 
 
 
Figure 5-17: timing diagram of the parasitic-insensitive integrator  
The transfer function of the SC integrator is obtained from the charge-conversion 
analysis. In this analysis, 𝑄[𝑡] is the total charge stored at instant 𝑡. Also, the input and 
output voltages at instant 𝑡 are denoted by 𝑉𝑖𝑛[𝑡] and 𝑉𝑜𝑢𝑡[𝑡]. Accordingly [110], 
 
𝑄[(𝑛 − 1)𝑇𝑐𝑘] = 𝑉𝑖𝑛[(𝑛 − 1)𝑇𝑐𝑘]𝐶𝑆 + 𝑉𝑜𝑢𝑡[(𝑛 − 1)𝑇𝑐𝑘]𝐶𝐼 
 
(5.43) 
 
𝑄 [(𝑛 −
1
2
) 𝑇𝑐𝑘 ] = (0)𝐶𝑆 + 𝑉𝑜𝑢𝑡 [(𝑛 −
1
2
)𝑇𝑐𝑘] 𝐶𝐼 
 
(5.44) 
From the charge conservation  𝑄[(𝑛 − 1/2)𝑇𝑐𝑘] = 𝑄[(𝑛 − 1)𝑇𝑐𝑘] ; thus, 
 
𝑉𝑜𝑢𝑡 [(𝑛 −
1
2
)𝑇𝑐𝑘] 𝐶𝐼 = 𝑉𝑖𝑛[(𝑛 − 1)𝑇𝑐𝑘]𝐶𝑆 + 𝑉𝑜𝑢𝑡[(𝑛 − 1)𝑇𝑐𝑘]𝐶𝐼 
 
 
(5.45) 
 
 101 
 
The charge stored on 𝐶𝐼 is constant during 𝜑1 ; hence  
 
𝑉𝑜𝑢𝑡[𝑛𝑇𝑐𝑘] = 𝑉𝑜𝑢𝑡 [(𝑛 −
1
2
)𝑇𝑐𝑘] 
 
 
(5.46) 
Combining equations (5.45) and (5.46) gives 
 
𝑉𝑜𝑢𝑡[𝑛𝑇𝑐𝑘] = 𝑉𝑖𝑛[(𝑛 − 1)𝑇𝑐𝑘]
𝐶𝑆
𝐶𝐼
+ 𝑉𝑜𝑢𝑡[(𝑛 − 1)𝑇𝑐𝑘] 
 
 
(5.47) 
which is the difference equation representation of the discrete-time integrator that was 
given in the chapter 3. The z-transform of this difference equation is  
 
𝑉𝑜𝑢𝑡(𝑧) =
𝐶𝑆
𝐶𝐼
 𝑧−1𝑉𝑖𝑛(𝑧) + 𝑧
−1𝑉𝑜𝑢𝑡(𝑧) 
 
 
(5.48) 
Thereby, the transfer function of the parasitic-insensitive SC integrator is 
 
𝐻(𝑧) =
𝑉𝑜𝑢𝑡(𝑧)
𝑉𝑖𝑛(𝑧)
=
𝐶𝑆
𝐶𝐼
 
𝑧−1
1 − 𝑧−1
 
 
 
(5.49) 
5.3.2 Speed and Precision Considerations 
Since capacitors take infinite time to be fully charged, the output is measured when it is 
settled within a certain error band [111]. Thus, it is essential to consider the speed-
precision trade-off in designing the integrator. For this purpose, the time constant of the 
circuit in each mode of operation must be calculated. In the sampling mode (Figure 
5.16(b)) [80], 
𝜏𝑠𝑎𝑚 = (𝑅𝑜𝑛1 + 𝑅𝑜𝑛3)𝐶𝑆    (5.50) 
where 𝑅𝑜𝑛 is the on-resistance of the switch (i.e. transistor in the triode region).  
 102 
 
Hence,    
 
𝜏𝑠𝑎𝑚 =
𝐶𝑆
𝜇𝑛𝐶𝑜𝑥(𝑊/𝐿)
(
1
(𝑉𝐷𝐷 − 𝑉𝑖𝑛 − 𝑉𝑇𝐻)
+
1
(𝑉𝐷𝐷 − 𝑉𝑇𝐻)
) 
 
(5.51) 
which indicates that a smaller sampling capacitor and a larger 𝑊/𝐿  yield higher 
sampling frequency. However, as the switch turns off, 𝑅𝑜𝑛  generates thermal noise 
which is stored on the 𝐶𝑆. The RMS voltage of the sampled noise is [112] 
 
𝜐𝑛 = √𝑘𝑇/𝐶𝑆 
 
 (5.52) 
where k is the Boltzmann constant, and T is the absolute temperature. On the other hand, 
the channel charge injection introduces an error to the sampled voltage which is [80]  
 
∆𝑉 ∝
𝑊𝐿𝐶𝑜𝑥
𝐶𝑆
 
 
 
(5.53) 
Therefore, 𝐶𝑆 must be sufficiently large to achieve a low noise and a low error. Since 
the effect of the channel charge injection is alleviated by the switch timing, it is possible 
to increase the speed by enlarging 𝑊.   
 
 
 
 
 
 
 103 
 
Figure 5.18 depicts the equivalent circuit of Figure 5.16(c). The output resistance of the 
op-amp is denoted by 𝑅𝑜𝑢𝑡.  
 
Figure 5-18: equivalent circuit of the parasitic-insensitive integrator in integration mode 
 
𝑉𝑖 = −𝑉𝑖𝑛 ; thus, KCL at the output node gives 
 
(−𝑉𝑖𝑛𝐴𝑣 − 𝑉𝑜𝑢𝑡)
1
𝑅𝑜𝑢𝑡
= 𝑉𝑜𝑢𝑡
𝐶𝐼𝐶𝑆
𝐶𝐼 + 𝐶𝑆
𝑠 
 
 
(5.54) 
Thereby, 
𝑉𝑜𝑢𝑡
𝑉𝑖𝑛
(𝑠) =
−𝐴𝑣
1 +
𝐶𝐼𝐶𝑆
𝐶𝐼 + 𝐶𝑆
𝑅𝑜𝑢𝑡𝑠
 
 
 (5.55) 
Hence, the time constant in the integration mode is  
𝜏𝑖𝑛𝑡 =
𝐶𝐼𝐶𝑆
𝐶𝐼 + 𝐶𝑆
𝑅𝑜𝑢𝑡 
   (5.56) 
which indicates that an op-amp with smaller 𝑅𝑜𝑢𝑡 yields higher integrating frequency. 
However, 𝐴𝑣  is directly proportional to 𝑅𝑜𝑢𝑡 . As explained in chapter 4, smaller 𝐴𝑣 
yields lower SNDR. Accordingly, there is a trade-off between the speed and precision 
requirements in the integration mode.   
 104 
 
Differential amplifier is the simplest op-amp topology. Gain of the differential amplifier 
is relatively low. Adding cascode devices to the differential amplifier increases its 
output impedance. Thereby, differential cascode topologies attain higher gain than the 
differential amplifier. It is also possible to further increase the output impedance by gain 
boosting. However, as mentioned earlier, increasing the output impedance reduces the 
speed of the integrator. Moreover, higher gain in these configurations comes at the cost 
of higher power dissipation, lower output swing, and additional poles. Another method 
of increasing the gain of a differential amplifier is to add a second stage to it. The gain 
of the two-stage op-amp is comparable with that of a cascode op-amp. However, the 
speed of the two-stage op-amp is lower than the speed of a cascode op-amp [80].  
The objective is to design an analogue DFT processor with lower power consumption 
than the digital FFT processor. Considering the power budget and the above comparison 
between the principal op-amp topologies, the differential amplifier has been selected. 
For the differential amplifier in Figure 5.19 |𝐴𝑣| = 𝐺𝑚𝑅𝑜𝑢𝑡 = 𝑔𝑚2(𝑟𝑜2||𝑟𝑜4), where 
𝑔𝑚𝑥  and 𝑟𝑜𝑥  are the transconductance and the output resistance of  𝑀𝑥 , respectively. 
Therefore, the speed and precision requirements in the integration mode can be met by 
increasing 𝐺𝑚 instead of 𝑅𝑜𝑢𝑡. 𝐺𝑚 = √𝜇𝑛𝐶𝑜𝑥(𝑊/𝐿)𝐼𝑆𝑆 ; hence, 𝐺𝑚 can be increased by 
choosing a larger aspect ratio for the input transistors or increasing the supply current. 
Increasing 𝑊/𝐿 increases the input capacitance and reduces the speed; thus, 𝐼𝑆𝑆 has to 
be increased. 
 105 
 
 
Figure 5-19: differential amplifier with single-ended output [80] 
 
𝜏𝑖𝑛𝑡  only determines the time-domain response of the small-signal. For large-signal, 
speed is limited by the slew rate. Slewing is a nonlinear phenomenon that distorts the 
output. In order to calculate the slew rate, the op-amp of Figure 5.16(c) is replaced by 
the differential amplifier (Figure 5.20). Slewing occurs when the sampled voltage |𝑉𝑠| is 
so large that one transistor (𝑀1 or 𝑀2) carries the entire 𝐼𝑆𝑆 and the other transistor turns 
off [80].  
 
Figure 5-20: Slewing in the op-amp [80] 
 106 
 
Since the feedback loop is broken, the output voltage is [80] 
|𝑉𝑜𝑢𝑡(𝑡)| = 𝐼𝑆𝑆 (
𝐶𝑆+𝐶𝐼
𝐶𝑆𝐶𝐼
) 𝑡 
 
 (5.57) 
Slew Rate (SR) is the slope of the output voltage.  
𝑆𝑅 = 𝐼𝑆𝑆 (
𝐶𝑆+𝐶𝐼
𝐶𝑆𝐶𝐼
) 
 
 (5.58) 
Output voltage of the integrator during the 𝑛th integration period is [113] 
𝑉𝑜𝑢𝑡(𝑡) = 𝑉𝑜𝑢𝑡(𝑛𝑇𝑐𝑘 − 𝑇𝑐𝑘) + 𝛼𝑉𝑠 (1 − 𝑒
−
𝑡
𝜏𝑖𝑛𝑡)        0 < 𝑡 <
𝑇𝑐𝑘
2
 
 
 (5.59) 
where, 𝑉𝑠 = 𝑉𝑖𝑛(𝑛𝑇𝑐𝑘 − 𝑇𝑐𝑘/2) and 𝛼 is the integrator leakage. Maximum slope of the 
output voltage is  
𝑑𝑉𝑜𝑢𝑡
𝑑𝑡
|
𝑡=0
=
𝛼𝑉𝑠
𝜏𝑖𝑛𝑡
 
 
 (5.60) 
In order to prevent slewing, the maximum slope of the 𝑉𝑜𝑢𝑡 must be lower than the SR 
[113]. Hence, 
𝛼𝑉𝑠
𝑅𝑜𝑢𝑡
< 𝐼𝑆𝑆 
 
 (5.61) 
Thus, the lower limit of 𝐼𝑆𝑆 is 𝛼𝑉𝑠/𝑅𝑜𝑢𝑡 while its upper limit is determined by the power 
budget. 
 107 
 
5.3.3  Circuit Realization  
Based on the Nyquist theorem, the sampling frequency must be at least twice the signal 
frequency. Since the DFT processor should support WiFi and WiMAX standards, the 
maximum signal frequency is 20 MHz. Thus, the maximum sampling frequency is 
considered to be 80 MHz. Settling time of the op-amp is  
𝑇𝑠𝑒𝑡 ≈ 5𝜏𝑖𝑛𝑡  (5.62) 
The durations of the sampling mode and integration mode must be equal; thus, 𝜏𝑖𝑛𝑡 can 
be replaced by  𝜏𝑠𝑎𝑚. Thereby, the unity gain bandwidth of the op-amp must be at least 
five times greater than the sampling frequency [114].   
𝑓𝑈 ≥ 5𝑓𝑆  (5.63) 
The output of the integrator should be sampled at (𝑁 + 1/2 ) 𝑇𝑐𝑘 , when the DFT 
computation is complete. As the DFT length (𝑁) increases, more samples should be 
stored on the 𝐶𝐼. Thus, the required 𝐶𝐼 for the WiMAX standard becomes prohibitively 
large. Large capacitors demand large layout area. More importantly, due to the fact that 
the gain of the integrator is inversely proportional to the 𝐶𝐼  , increasing the value of 𝐶𝐼 
attenuates the signal. The attenuation might be so severe that ADC cannot detect the 
signal. Thus, the upper limit of 𝐶𝐼 is determined by the quantization level of ADC. To 
overcome this problem, the DFT sum is broken into partial sums (equation (5.64)) and 
𝐶𝐼 is discharged after each partial sum is calculated. The results of partial sums can be 
added together in the Digital Signal Processor (DSP). 
𝑋(𝑘) = ∑ 𝑥(𝑛)
𝑁−1
𝑛=0
𝑊𝑁
𝑛𝑘 = ∑ 𝑥(𝑛)
𝑀−1
𝑛=0
𝑊𝑁
𝑛𝑘 + ∑ 𝑥(𝑛)
2𝑀−1
𝑀
𝑊𝑁
𝑛𝑘 +⋯+ ∑ 𝑥(𝑛)
𝑁−1
𝑁−𝑀
𝑊𝑁
𝑛𝑘 
 
 (5.64) 
To discharge 𝐶𝐼 at 𝜑𝑀 , S5 and S6 connect both sides of 𝐶𝐼 to the input CM level of the 
Op-amp (Figure 5.21). In Figure 5.21, the input CM level of the Op-amp is shown by 
the ground symbol. 
 108 
 
 
Figure 5-21: Parasitic-insensitive integrator with reset switches 
 
Since 𝐶𝐼 ≫ 𝐶𝑆 , 𝐶𝐼 cannot be fully discharged during the sampling time of 𝐶𝑆. Therefore, 
the number of integrators is doubled so that multipliers can switch between two 
integrators. Thereby, one integrator is calculating a partial sum while the output of the 
other integrator in being read.  
Input of the integrator is connected to the output of the complex multiplier. Thus, the 
input CM level of the op-amp (Figure 5.19) is equal to the output CM level of the 
complex multiplier (𝑉𝑖𝑛,𝐶𝑀 = 1.2𝑉). In order to keep 𝑀2 in the saturation region, the 
output voltage should be 𝑉𝑜𝑢𝑡 ≥ 𝑉𝑖𝑛,𝐶𝑀 − 𝑉𝑇𝐻2. With 𝑉𝑇𝐻2 = 0.6𝑉, the output voltage 
should be 𝑉𝑜𝑢𝑡 ≥ 0.6𝑉. Hence, 𝑉𝑜𝑢𝑡 = 1.2𝑉 is selected. Since 𝑉𝑜𝑢𝑡 = 𝑉𝐷𝐷 − |𝑉𝐺𝑆3| and 
|𝑉𝑇𝐻3| = 0.4𝑉, the overdrive voltage of 𝑀3 is |𝑉𝑂𝐷3| = 0.2𝑉.  
Gain of the op-amp is 𝐴𝑣 = −𝑔𝑚1(𝑟𝑂1||𝑟𝑂3) . Also, 𝑔𝑚1 = 2𝐼𝐷1 𝑉𝑂𝐷1⁄ . Therefore, 
𝑉𝑂𝐷1 = 0.1𝑉  is selected to achieve a high gain. 𝑉𝑂𝐷5 = 0.2𝑉  is allocated to 𝑀5 . 
Considering the power budget (section 4.4.1), 𝐼𝑆𝑖𝑛𝑔𝑙𝑒−𝑒𝑛𝑑𝑒𝑑 𝑖𝑛𝑡𝑒𝑔𝑟𝑎𝑡𝑜𝑟 = 50 𝜇𝐴  is 
allocated to 𝑀5. With the bias current and overdrive voltage of each transistor known, 
the aspect ratios of the transistors can be determined. To minimize the device 
capacitances, the minimum length (0.2 𝜇𝑚) was chosen for all transistors except 𝑀5. 
Table 5-3 shows the aspect ratios of the initial design which satisfies the swing and 
power budget specifications. 
 
 109 
 
Table 5-3: initial aspect ratios of the op-amp 
Transistor 𝑀1 −𝑀2 𝑀3 −𝑀4 𝑀5 
𝑊
𝐿
 (
𝜇𝑚
𝜇𝑚
) 
2
0.2
 
4
0.2
 
8
0.8
 
  
Since 𝑔𝑚𝑟𝑂 ∝ √𝑊𝐿 𝐼𝐷⁄  , gain can be increased by increasing the width and length of 
the transistors [80]. Since 𝑀1 −𝑀4 appear in the signal path, their maximum lengths are 
determined by the bandwidth requirement (𝑓𝑈 = 400 𝑀𝐻𝑧).    
Based on the equations 5.51 and 5.53, switches of the integrator (Figure 5.21) must have 
large 𝑊/𝐿 and small 𝑊𝐿. Hence, the minimum length is chosen for switches. Thereby, 
the width of 1 𝜇𝑚  is required to yield 𝑓𝑠 ≥ 80 𝑀𝐻𝑧 . Since the source and drain 
terminals may interchange, the bulk terminal of the NMOS switches must be connected 
to the ground. Table 5-4 shows the final aspect ratios. By selecting 𝑉𝑏𝑖𝑎𝑠 = 0.6𝑉, 𝑀5 
provides 54𝜇𝐴.  
Table 5-4: final aspect ratios of the parasitic-insensitive integrator 
Transistor 𝑀1 −𝑀2 𝑀3 −𝑀4 𝑀5 𝑆1 − 𝑆6 
𝑊
𝐿
 (
𝜇𝑚
𝜇𝑚
) 
10
1
 
20
1
 
40
4
 
1
0.2
 
 
𝐶𝑆 =  50𝑓𝐹  is selected because it is the smallest capacitor that holds the sampled 
voltage without dropping due to charge leakage. 𝐶𝐼 must be at least ten times bigger 
than 𝐶𝑆, otherwise the circuit in Figure 5.21 acts as a Low Pass Filter (LPF) instead of 
an integrator. Figure 5.22 shows the output of a differential parasitic-insensitive 
integrator that sums 8 pieces of the piecewise continuous signal (𝑀 = 8) . This 
integrator is realized with 𝐶𝐼 =  1𝑝𝐹 . The power consumption of the differential 
integrator is 395𝜇𝑊. While 𝜑𝑀 =  0𝑉, the differential integrator is in the integration 
mode (𝐶𝐼 is charging). While 𝜑𝑀 =  2𝑉, the differential integrator is in the reset mode 
(𝐶𝐼 is discharging). 
 110 
 
 
Figure 5-22: output of a differential parasitic-insensitive integrator simulated in SPICE 
 
5.4 Real-Time Recursive DFT Processor  
As mentioned in the previous section, the DFT sum is broken into partial sums to 
overcome the limitation on 𝐶𝐼. To determine the maximum value of 𝑀 in equation 5.64, 
the impact of 𝐶𝐼 on the DFT processor performance must be analysed. For the purpose 
of this analysis, the analogue multiplier and the parasitic-insensitive integrator that were 
designed in previous sections are used to realize the real-time recursive DFT processor. 
An OFDM signal with QPSK modulation was applied to the input of the processor.  
SNDR curves of 8-point DFT with ideal devices were obtained for different values of 𝐶𝐼 
(Figure 5.23). The SNDR curve of an 8-point DFT with ideal integrators (integrations 
are performed by MATLAB) is also shown in Figure 5.23.  
 111 
 
 
Figure 5-23: The SNDR curves of real-time recursive DFT processors with ideal devices 
 
These results indicate that in the absence of mismatch the DFT processor with larger 𝐶𝐼 
provides higher SNDR at low signal levels (input magnitude ≤ -25 dBV). At high signal 
levels (input magnitude > -25 dBV), however, the DFT processor with larger 𝐶𝐼 is more 
susceptible to the Op-amp saturation. 
Using the device mismatch model that is provided in chapter 6, the impact of 𝐶𝐼 on the 
performance of the DFT processor is analysed (Figure 5.24). These results indicate that 
in the presence of device mismatch the DFT processor with larger 𝐶𝐼 is more susceptible 
to noise and distortion at low signal levels. This inference is in contrast to the inference 
from DFT with ideal devices (Figure 5.23). Reduction of the SNDR with 𝐶𝐼 increase is 
due to the fact that a DFT with larger 𝐶𝐼 has a lower gain. The dynamic range and the 
peak SNDR of DFT processors with 𝐶𝐼 = 500 𝑓𝐹 and 𝐶𝐼 = 1 𝑝𝐹 are almost the same. 
By selecting 𝐶𝐼 = 1 𝑝𝐹 , the maximum length of partial sums in equation (5.64) 
becomes 𝑀 = 8. 
 112 
 
 
Figure 5-24: SNDR curves of real-time recursive DFT processors in the presence of device mismatch 
 
For multi-standard radio applications, DFT processor should compute Fourier transform 
with various lengths. Hence, the impact of the transform length on the DFT processor 
performance must be analysed. For the purpose of this analysis, an OFDM signal with 
BPSK modulation was applied to the input of the processor. Figure 5.25 shows the 
SNDR curves of 8-point DFT and 16-point DFT with ideal devices. As mentioned 
earlier 𝑀 = 8. Hence, the 16-point DFT is calculated by breaking the DFT sum into two 
partial sums and adding the results of partial sums in MATLAB (equation (5.64)). 
Results of this analysis indicate that in the absence of mismatch increasing the 
transform length does not affect the performance of the recursive DFT processor. 
 113 
 
 
Figure 5-25: SNDR curves of real-time recursive DFT processors with different transform lengths  
5.5 Accuracy of the Results 
Figure 5.26 shows the design and verification steps that must pass to create an 
Integrated Circuit (IC). The architectural design was explained in chapter 3. The 
behavioural models and the system-level performance analysis were covered in chapter 
4. This chapter and chapter 6 provide the circuit design and the circuit-level 
performance analysis.  
 114 
 
 
Figure 5-26: steps in the integrated circuit design flow [115] 
 
 
 
 
 
 115 
 
Interconnects properties (i.e. series resistance and parallel capacitance) impact the 
performance of the circuit. For long interconnects, the parasitic resistance and 
capacitance cause signal delay. Also, the series resistances in supply and ground lines 
create dc and transient voltage drops. Besides, charging extra capacitances increases the 
power consumption. To increase the accuracy of the circuit model, parasitic devices 
should be extracted from the layout design and annotated on the pre-layout schematic 
netlist [80].  
The physical verification step requires the access to the design rules for layout. Since 
the Process Design Kit (PDK) was not available, results of the pre-layout simulations 
are provided in this chapter and in the next chapter. For frequencies below 100 MHz, 
results of the pre-layout simulations are in good agreement with the experimental results 
[116]. The sampling frequency of the Switched-Capacitor integrator is 𝑓𝑠 = 80 𝑀𝐻𝑧. 
Hence, results of the circuit-level performance analysis are reliable. 
5.6 Summary 
The real-time recursive DFT processor is realized by analogue multipliers in 
conjunction with switched capacitor integrators. Differential circuits have an odd-
symmetric input/output characteristic; hence, they do not produce even harmonics. 
Accordingly, to enhance the nonlinearity cancellation, a fully differential configuration 
is used. The advantage of the proposed design approach over the previous designs is 
that it is both reconfigurable and area efficient. In this chapter, speed-power-accuracy 
trade-offs in circuits with ideal devices has been discussed. In order to analyse the 
impact of the transform length on the DFT processor performance, an 8-point DFT and 
a 16-point DFT were simulated with ideal devices. Results of this analysis indicate that 
in the absence of mismatch increasing the transform length does not affect the 
performance of the recursive DFT processor. The performance of the real-time recursive 
DFT processor in the presence of device mismatch will be analysed in the next chapter. 
 
 116 
 
Chapter 6   
DEVICE MISMATCH ANALYSIS AND 
RESULTS 
In the previous chapter, the real-time recursive DFT processor was simulated with 
perfectly symmetric circuits. In reality, however, uncertainties in the manufacturing 
process lead to mismatch between nominally identical devices. In this chapter, the 
impact of device mismatch on the performance of the circuit is analysed. To this aim, 
first the mismatch models available in the open literature are reviewed. Then, the design 
tradeoffs that impose limitations on the performance of analogue signal processors are 
explained. Next, the effect of technology scaling on mismatch is discussed. Results of 
the mismatch analysis are presented and compared with previous work. Finally, some 
techniques that can mitigate the effect of device mismatch are briefly described.   
6.1 MOS Transistor Matching Models 
Generally, process mismatch analysis is based on global and local variations. Global 
mismatch is the total variation over a wafer or a batch. Local mismatch occurs between 
adjacent devices on the same chip. For a matched pair of MOS transistors, threshold 
voltage differences ∆𝑉𝑇𝐻   and current factor differences ∆𝛽 ( 𝛽 = 𝜇𝐶𝑜𝑥𝑊/𝐿 ) are the 
dominant sources of mismatch [117].  
 117 
 
Pelgrom’s mismatch model [117] describes the behaviour of ∆𝑉𝑇𝐻 and ∆𝛽 as the spatial 
variations of device parameters  
 
𝜎2(∆𝑉𝑇𝐻) =
𝐴𝑉𝑇𝐻
2
𝑊𝐿
+ 𝑆𝑉𝑇𝐻
2 𝐷2 
 
(6.1) 
 
𝜎2(∆𝛽)
𝛽2
=
𝐴𝑊
2
𝑊2𝐿
+
𝐴𝐿
2
𝑊𝐿2
+
𝐴𝜇
2
𝑊𝐿
+
𝐴𝐶𝑜𝑥
2
𝑊𝐿
+ 𝑆𝛽
2𝐷2 ≈
𝐴𝛽
2
𝑊𝐿
+ 𝑆𝛽
2𝐷2 
 
(6.2) 
where 𝐴𝑃  is the area proportionality constant for parameter 𝑃, 𝑆𝑃  is the variation of 
parameter 𝑃 with spacing 𝐷, 𝑊 is the effective width and 𝐿 is the effective length of the 
channel, 𝜇 is the mobility of charge carriers, and 𝐶𝑜𝑥 is the gate oxide capacitance per 
unit area. Equations (6.1) and (6.2) show that local variations decrease as the effective 
channel area (𝑊𝐿) increases; whereas global variations ( 𝑆𝑉𝑇𝐻  and  𝑆𝛽) are independent 
of the device dimensions.  
Since the advent of the submicron technologies, more accurate mismatch models have 
been proposed [118, 119]. However, these models require the access to the standard cell 
libraries and Process Design Kit (PDK). Similarly, global variations of the Pelgrom’s 
mismatch model require the access to the design rules for layout. Since the design of 
analogue circuits is based on the device size, local variations are the main focus of 
attention for circuit designers. Hence, in the absence of the standard cell libraries and 
PDK, the experimental data available in the open literature was used to model the local 
variations described by Pelgrom. 
Combining (6.1) and (6.2) yields the drain current mismatch in the saturation region 
[120, 121]  
 
𝜎2(∆𝐼𝐷)
𝐼𝐷
2 = 4 
𝜎2(∆𝑉𝑇𝐻)
(𝑉𝐺𝑆 − 𝑉𝑇𝐻)2
 +  
𝜎2(∆𝛽)
𝛽2
 
 
(6.3) 
 
 118 
 
As explained in chapter 5, signal swing requirements limit the overdrive voltage of each 
transistor to less than 0.65 𝑉. For (𝑉𝐺𝑆 − 𝑉𝑇𝐻) < 0.65 𝑉, ∆𝑉𝑇𝐻 is the main source of the 
drain current mismatch [120, 121]. Thus, the contribution of the ∆𝛽 mismatch can be 
neglected. In conclusion, a simplified version of the Pelgrom’s mismatch model can be 
used. 
𝜎2(∆𝑉𝑇𝐻) =
𝐴𝑉𝑇𝐻
2
𝑊𝐿
 
     (6.4) 
6.2 MOS Transistor Optimum Matching 
Pelgrom’s model describes the ∆𝑉𝑇𝐻 mismatch with a variance inversely proportional to 
the effective transistor channel area (𝑊𝐿). Accordingly, mismatch can be reduced by 
increasing the effective channel area. The effective channel dimensions are defined as 
 
𝑊𝑒𝑓𝑓 = 𝑊𝑑𝑟𝑎𝑤𝑛 − 2𝑊𝐷 
𝐿𝑒𝑓𝑓 = 𝐿𝑑𝑟𝑎𝑤𝑛 − 2𝐿𝐷 
 
(6.5) 
 
 
(6.6) 
 
 where 𝑊𝑑𝑟𝑎𝑤𝑛 and 𝐿𝑑𝑟𝑎𝑤𝑛 are the layout dimensions, 𝐿𝐷is the side diffusion of source 
and drain, and 𝑊𝐷 is the field oxide encroachment upon the channel. A short channel 
(large 𝑊𝑑𝑟𝑎𝑤𝑛/𝐿𝑑𝑟𝑎𝑤𝑛) and a narrow channel (small 𝑊𝑑𝑟𝑎𝑤𝑛/𝐿𝑑𝑟𝑎𝑤𝑛) for devices with 
equal drawn areas are shown in Figure 6.1. Since a narrow channel has larger effective 
area than a short channel, devices with smaller 𝑊/𝐿 provide better matching. Optimum 
matching is achieved when [122] 
 
𝑊𝑑𝑟𝑎𝑤𝑛
𝐿𝑑𝑟𝑎𝑤𝑛
= 
𝑊𝐷
𝐿𝐷
 
 
(6.7) 
 119 
 
 
Figure 6-1: Equal drawn area devices (a) short channel (b) narrow channel 
 
Moreover, considering the ∆𝑉𝑇𝐻 variation, drain current in the saturation region can be 
expressed as   
𝐼𝐷 =
𝜇𝐶𝑜𝑥
2
𝑊
𝐿
(𝑉𝐺𝑆 − 𝑉𝑇𝐻 − ∆𝑉𝑇𝐻)
2 
 (6.8) 
Expanding the square term gives  
𝐼𝐷 =
𝜇𝐶𝑜𝑥
2
𝑊
𝐿
(𝑉𝐺𝑆 − 𝑉𝑇𝐻)
2 − 𝜇𝐶𝑜𝑥
𝑊
𝐿
(𝑉𝐺𝑆 − 𝑉𝑇𝐻)∆𝑉𝑇𝐻 +
𝜇𝐶𝑜𝑥
2
𝑊
𝐿
∆𝑉𝑇𝐻
2 
 (6.9) 
First term is the ideal drain current. Last term is negligible due to small ∆𝑉𝑇𝐻
2
. Hence, 
second term is the dominant mismatch, which can be reduced by minimizing the 𝑊/𝐿 
aspect ratio. In summary, sensitivity to ∆𝑉𝑇𝐻 mismatch is minimized by minimizing 
𝑊/𝐿 and maximizing 𝑊𝐿 [59]. These conditions can be met if the channel length is 
maximized. Tradeoffs that impose an upper limit on the channel length will be 
discussed in the next section. 
 120 
 
6.3 Impact of Mismatch on the Performance 
Tradeoffs  
In the previous section, it has been discussed that mismatch can be reduced by 
increasing the channel area. However, tradeoffs in the design of the analogue circuits 
impose an upper limit on the device area. In view of the power budget and system 
specifications, circuit designers must investigate the optimal design.  
In the presence of mismatch, circuits that were designed in chapter 5 (the multiplier and 
the op-amp of the SC integrator) suffer from dc offset at their output. The output dc 
offset can be defined as the input-referred offset voltage that makes the output voltage 
zero. Hence, accuracy (𝐴𝐶𝐶) can be measured by [121] 
 
𝐴𝐶𝐶 = 
𝑉𝑖𝑛 𝑅𝑀𝑆
3𝜎(𝑉𝑂𝑆)
  
 (6.10) 
where 𝑉𝑖𝑛 𝑅𝑀𝑆 is the RMS of the input signal, and 𝑉𝑂𝑆 is the input-referred offset voltage 
of the circuit (multiplier or op-amp). Since 𝑉𝑂𝑆 is strongly dependent on the contribution 
of the input differential pair [121],  
𝐴𝐶𝐶 ≈  
𝑉𝑖𝑛 𝑅𝑀𝑆√𝑊𝐿
3𝐴𝑉𝑇𝐻
  
 (6.11) 
where 𝑊  and 𝐿  are the width and length of the input devices, respectively. Hence, 
accuracy can be improved by increasing the channel area. However, increasing the 
device area increases the input capacitance [121] 
𝐶𝑖𝑛 = 
𝐶𝑔𝑠
2
=  
1
2
.
2𝐶𝑜𝑥𝑊𝐿
3
  
 (6.12) 
where 𝐶𝑖𝑛 is the input capacitance of the circuit (multiplier or op-amp), and 𝐶𝑔𝑠 is the 
junction capacitance between the gate and the source of the input device.  
 121 
 
Combining (6.11) and (6.12) yields 
 
𝐴𝐶𝐶2 ≈  
𝐶𝑖𝑛 𝑉𝑖𝑛 𝑅𝑀𝑆
2
3𝐶𝑜𝑥𝐴𝑉𝑇𝐻
2   
 
(6.13) 
The energy stored on the input capacitor is calculated by [55] 
 
𝐸 = 
1
2
𝐶𝑖𝑛 𝑉𝑖𝑛 𝑅𝑀𝑆
2 
 
 (6.14) 
Hence, the power consumption of the circuit (multiplier or op-amp) is  
 
𝑃 =  
𝐸
𝜏
=
𝑓 𝐶𝑖𝑛 𝑉𝑖𝑛 𝑅𝑀𝑆 
2
2
 
 
 (6.15) 
where  𝜏  and 𝑓  are the time constant and the operating frequency of the circuit, 
respectively. Combining (6.13) and (6.15) yields 
𝑃 ≈
3
2
 𝑓𝐶𝑜𝑥𝐴𝑉𝑇𝐻
2𝐴𝐶𝐶2 
(6.16) 
Replacing the operating frequency with the circuit bandwidth gives 
𝐵𝑊 𝐴𝐶𝐶2
𝑃
≈
2
3𝐶𝑜𝑥𝐴𝑉𝑇𝐻
2  
 (6.17) 
which is the bandwidth-accuracy-power trade-off of the circuit (multiplier or op-amp). 
This trade-off is only determined by the technology parameters 𝐶𝑜𝑥𝐴𝑉𝑇𝐻
2
 and circuit 
designer has no influence on the overall trade-off. Increasing the device area increases 
both accuracy and input capacitance (equations (6.11) and (6.12)). For constant power, 
as input capacitance increases, operating frequency decreases (equation (6.15)). Thus, 
increasing the accuracy reduces the bandwidth. 
 122 
 
Multiplications are performed at 20MHz. Based on the results of the simulations in the 
previous chapter, the power consumption of the real multiplier is 120µW. Hence, 
accuracy of the multiplier is 
𝐴𝐶𝐶𝑀
2 ≈
4 × 10−12
𝐶𝑜𝑥𝐴𝑉𝑇𝐻
2   
 
 (6.18) 
On the other hand, the single-ended integrator operates at 80MHz and consumes 198µW 
power. Thus, accuracy of the op-amp is 
𝐴𝐶𝐶𝑂𝑝
2 ≈
1.65 × 10−12
𝐶𝑜𝑥𝐴𝑉𝑇𝐻
2   
 
 (6.19) 
Hence, the technology parameters 𝐶𝑜𝑥𝐴𝑉𝑇𝐻
2
 have more impact on 𝐴𝐶𝐶𝑜𝑝
2
 than 𝐴𝐶𝐶𝑀
2
. 
6.4 Impact of Technology Scaling on the Mismatch 
Technology scaling reduces the gate oxide thickness (𝑡𝑜𝑥) and increases the substrate 
doping level. Reduction in the 𝑡𝑜𝑥 reduces the 𝐴𝑉𝑇𝐻. However, increase in the substrate 
doping level increases the 𝐴𝑉𝑇𝐻 [117, 123].  
The reduction in the power supply voltage by technology scaling leads to the reduction 
in the power consumption and signal swing. Reduction in the signal swing leads to a 
quadratic reduction in the dc accuracy, while power consumption is reduced linearly.  
Linear reduction of 𝐴𝑉𝑇𝐻 with feature size implies that deeper submicron technologies 
have better matching for devices occupying a constant area. However, quadratic 
reduction in the dc accuracy is more significant than the linear reduction of 𝐴𝑉𝑇𝐻 [121]. 
 123 
 
6.5 Mismatch Analysis Results 
Sensitivity of the real-time recursive DFT processor to device mismatch is analysed 
using the Pelgrom’s model described in section 6.1. Accordingly, the ∆𝑉𝑇𝐻  random 
variation has a normal distribution with zero mean and a variance described by the 
equation (6.4). Thereby, a pair of matched devices named 𝑀1 and 𝑀2  , with 𝛿𝑉𝑇𝐻𝑖 
random variation for each device, has random difference ∆𝑉𝑇𝐻 = 𝛿𝑉𝑇𝐻1 − 𝛿𝑉𝑇𝐻2 . 
Hence, variance of each device is 
𝜎2(𝛿𝑉𝑇𝐻𝑖) =
𝐴𝑉𝑇𝐻
2
2𝑊𝐿
 
   (6.20) 
𝑉𝑇𝐻  mismatch is modelled by an error voltage source in series with the gate of the ideal 
device (Figure 6.2). The 𝐴𝑉𝑇𝐻  proportionality constant is extracted from the 
experimental results of a study on the TSMC 0.18𝜇𝑚 mixed signal CMOS technology 
with 1.8V supply voltage [124]. Based on this study, a device with cross-coupled layout 
configuration has the minimum 𝐴𝑉𝑇𝐻. Thus, assuming that the layout configuration is 
cross-coupled, 𝐴𝑉𝑇𝐻 = 1.7𝑚𝑉𝜇𝑚  for NMOS and 𝐴𝑉𝑇𝐻 = 1.74𝑚𝑉𝜇𝑚  for PMOS are 
used in the device mismatch analysis.  
 
Figure 6-2: Modeling VTH variations using a DC voltage source in series with the MOS gate terminal 
 
 124 
 
The Monte Carlo analysis is performed for the real-time recursive DFT processors of 
length 8 and 16. For the purpose of this analysis, OFDM signals with BPSK and QPSK 
modulations are generated by MATLAB/Simulink. The MATLAB code and the SPICE 
netlist are available in Appendix B. The results of the Monte Carlo analysis for the 
BPSK modulated signal are shown in Figure 6.3. 
 
 
Figure 6-3: Mismatch analysis results of the real-time recursive DFT processor of length 8 
 
Table 6-1 gives the statistics of the Monte Carlo analysis. The dynamic ranges are 
obtained by measuring the width of the SNDR curves at the minimum required SNDR. 
According to Table 4-1, the minimum receiver SNDR requirements for the OFDM 
signals with BPSK and QPSK modulations are 3 dB and 5 dB, respectively. As 
explained in section 4.2, the minimum required dynamic rage for the BPSK and QPSK 
modulated signals are 34 dB and 36 dB, respectively.  
 125 
 
Table 6-1: Summary of the Monte Carlo analysis for the recursive DFT processors of length 8 
 
For BPSK modulated signal, the dynamic range histograms of the 8-point DFT 
processor and the 16-point DFT processor are shown in Figure 6.4 and Figure 6.5, 
respectively. The average dynamic range of the 16-point DFT processor is 33.4dB. 
Hence, doubling the length of the DFT processor reduces the average dynamic range by 
3dB. For QPSK modulated signal, the dynamic range histogram of the 8-point DFT 
processor is depicted in Figure 6.6. Table 6-2 provides the results of the yield prediction. 
 
 
Figure 6-4: dynamic range histogram of the 8-point DFT processor for BPSK modulated signal 
(dB) Dynamic range Peak SNDR 
BPSK QPSK BPSK QPSK 
Mean 36.3 33.2 22.5 22.3 
Standard deviation 1.6 1.6 1.3 1.3 
 126 
 
 
Figure 6-5: dynamic range histogram of the 16-point DFT processor for BPSK modulated signal 
 
 
Figure 6-6: dynamic range histogram of the 8-point DFT processor for QPSK modulated signal 
 127 
 
Table 6-2: Summary of the yield prediction for the recursive DFT processors of length 8 and 16 
DFT Length BPSK QPSK 
8 97.5 % 8.9 % 
16 43.4 % _ 
 
Table 6-3 compares the performance of the proposed architecture with an analogue FFT 
processor. For the purpose of this comparison, OFDM signal with BPSK modulation is 
used. Also, dynamic range is measured at 7 dB which is the minimum required SNDR 
for the DT FFT. Dynamic range and peak SNDR of the proposed architecture are 
20.7dB and 13.5dB less than the DT FFT processor, respectively.   
 
Table 6-3: Performance comparison of the analogue Fourier Transform processors 
Performance Metric Proposed DFT DT FFT [57] 
CMOS Technology 180 nm 130 nm 
Supply Voltage 1.8 V 1.2 V 
Input Frequency 20 MHz 1 GHz 
Operating Frequency 80 MHz 100 MHz 
Length 8  8 
Peak SNDR 22.5 dB 36 dB 
Dynamic Range 28.3 dB 49 dB 
Power Consumption 10 mW 25 mW 
 
As explained in chapter 5, sampling frequency of the SC integrator must be at least 
twice the signal frequency. Hence, operating frequency of the recursive DFT processor 
is greater than its input frequency. On the other hand, owing to the serial to parallel 
conversion in the DT FFT processor, operating frequency of the DT FFT processor is 
less than its input frequency. Hence, parallel processing relaxes the bandwidth 
requirement of multipliers in the DT FFT processor.  
 128 
 
Normalizing the power consumption of the DT FFT processor to the 180nm technology, 
1.8V supply voltage, and 80MHz operating frequency gives 
𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑃𝑜𝑤𝑒𝑟 =
25𝑚𝑊
(
130𝑛𝑚
180𝑛𝑚) (
100𝑀𝐻𝑧
80𝑀𝐻𝑍 ) (
1.2𝑉
1.8𝑉)
2 = 62.3𝑚𝑊 
 
 (6.21) 
Hence, power consumption of the recursive DFT processor is about 1/6 of the power 
consumption of the DT FFT processor. As explained in sections 6.3, linear reduction in 
the power consumption leads to quadratic reduction in the dc accuracy. Thus, lower 
peak SNDR and lower dynamic range of the recursive DFT processor are due to its 
lower power consumption. Other factors that contribute to the performance degradation 
of the recursive DFT processor are investigated in the next section.    
6.6 Root Cause Analysis 
According to the equation (2.12), the number of analogue multipliers that are required 
to implement a Radix-2 FFT of length 8 is 104. On the other hand, the real-time 
recursive DFT performs 256 multiplications to compute DFT of length 8. Results of the 
system-level performance analysis in chapter 4 showed that the proposed DFT 
processor has better performance than the FFT processor. Hence, performance 
degradation of the proposed DFT processor in the previous section is not due to the 
number of multiplications. This conclusion leads to the realization that the values of the 
mismatch parameters in the system-level performance analysis were under estimated. 
To find which non-ideality makes the most contribution to the performance degradation, 
the effect of each non-ideality is analysed individually. The effect of multipliers 
saturation is revealed by the SNDR curve of a recursive DFT processor with ideal 
integrators (integrations are performed by MATLAB). On the other hand, the effect of 
integrators (Op-amps) saturation is shown by the SNDR curve of a recursive DFT 
processor with Switched-Capacitor integrators. The aforementioned curves are shown in 
Figure 6.7.  
 129 
 
 
Figure 6-7: The SNDR curves of 8-point recursive DFT processors with ideal devices 
 
The dynamic range is obtained by measuring the width of the SNDR curves at the 
minimum required SNDR. For the comparison between the recursive DFT and the DT 
FFT processors (Table 6-3), the dynamic range was measured at 7dB. A comparison 
between the blue and the red curves at SNDR = 7dB reveals that saturation of the Op-
amps in the SC integrators reduces the dynamic range by 7dB. The linear range of the 
multipliers in the recursive DFT processor is greater than the linear range of the 
multipliers in the DT FFT processor [57]. Hence, at high signal levels (input magnitude 
> -25 dBV), the Op-amp saturation is the reason of achieving a dynamic range less than 
the dynamic range of the DT FFT. 
The effect of multipliers’ device mismatches is revealed by a comparison between the 
SNDR curves of a recursive DFT with ideal integrators in the absence and presence of 
device mismatches (the blue and the red curves in Figure 6.8). Measuring the widths of 
the aforementioned curves at SNDR = 7dB shows that multipliers’ device mismatches 
reduce the dynamic range by 20dB.  
 130 
 
 
Figure 6-8: The SNDR curves of 8-point recursive DFT processors 
 
The effect of SC integrators’ device mismatches is revealed by a comparison between 
the SNDR curves of a recursive DFT with ideal integrators and a recursive DFT with 
SC integrators in the presence of device mismatches (the red and the yellow curves in 
Figure 6.8). A comparison between the widths of the aforementioned curves at SNDR = 
7dB shows that SC integrators’ device mismatches reduce the dynamic range by 3dB. 
This analysis reaches to the conclusion that multipliers’ device mismatches make the 
most contribution to the dynamic range reduction. Increasing the power consumption of 
the multiplier can improve its accuracy. Other methods of mitigating the effect of device 
mismatch are discussed in the next section.      
 131 
 
6.7 Mitigation of the Effect of Device Mismatch 
As explained in the previous section, multipliers’ device mismatches reduce the 
dynamic range significantly. Two approaches that can be taken to solve this problem are 
electronic offset cancellation and error correction techniques. The topology of an offset 
cancellation technique, which can be used for transconductance multipliers, is shown in 
Figure 6.9 [80]. Each 𝐺𝑚 stage is a differential pair and the R stage is a transimpedance 
amplifier.  
 
 
 
Figure 6-9: Offset cancellation by an auxiliary transconductance in a negative feedback loop [80] 
 
Suppose that first only S1 and S2 are on, thus 𝑉𝑜𝑢𝑡 = 𝐺𝑚1𝑉𝑂𝑆1𝑅 . Then, assuming that S3 
and S4 are on, a negative feedback loop is made with R and 𝐺𝑚2 . Thus, 𝑉𝑜𝑢𝑡 =
𝐺𝑚1𝑉𝑂𝑆1𝑅  is stored across 𝐶1  and 𝐶2 . Afterwards, 𝐺𝑚2  converts the voltage across 
capacitors to 𝐼𝑜𝑢𝑡 2 = 𝐺𝑚2𝐺𝑚1𝑉𝑂𝑆1𝑅  . When 𝑉𝑖𝑛  is connected, 𝐺𝑚2 adds an offset 
correction current at nodes X and Y [80]. Taking the offset voltage of 𝐺𝑚2 into account, 
the stored voltage on 𝐶1 and 𝐶2 is [80] 
 
𝑉𝑜𝑢𝑡 = [𝐺𝑚1𝑉𝑂𝑆1 − 𝐺𝑚2(𝑉𝑜𝑢𝑡 − 𝑉𝑂𝑆2)]𝑅 
 
  
(6.22) 
 132 
 
Thereby, 
𝑉𝑜𝑢𝑡 =
𝐺𝑚1𝑅𝑉𝑂𝑆1 + 𝐺𝑚2𝑅𝑉𝑂𝑆2
1 + 𝐺𝑚2𝑅
 
 
   (6.23) 
Hence, the offset voltage referred to the main input is 
 
𝑉𝑂𝑆,𝑡𝑜𝑡 =
𝑉𝑜𝑢𝑡
𝐺𝑚1𝑅
=
𝑉𝑂𝑆1
1 + 𝐺𝑚2𝑅
+
𝐺𝑚2
𝐺𝑚1
𝑉𝑂𝑆2
1 + 𝐺𝑚2𝑅
≈
𝑉𝑂𝑆1
𝐺𝑚2𝑅
+
𝑉𝑂𝑆2
𝐺𝑚1𝑅
 
 
 
(6.24) 
If 𝐺𝑚2𝑅 ≫ 1 and  𝐺𝑚1𝑅 ≫ 1 , then 𝑉𝑂𝑆,𝑡𝑜𝑡  is very small. However, 𝐺𝑚1𝑅 is the gain of 
the multiplier. The linear range of the multiplier decreases as its gain increases. 
Therefore, the aforementioned offset cancellation technique imposes a trade-off 
between dynamic range and accuracy. Moreover, due to the large area overhead of the 
offset cancellation techniques, they cannot be widely applied. 
Tradeoffs in the electronic offset cancellation justify the use of error correction 
techniques. These techniques can be divided into three main categories; error correction 
codes, equalizers, and signal processing algorithms. One study showed that Turbo 
Product Code (TPC) effectively mitigates the mismatch loss of a 256-point analogue 
FFT [87]. It is also shown that Minimum Mean Square Error (MMSE) and Least Mean 
Square (LMS) equalizers can mitigate the performance degradation of the DFT 
processor implemented on a FPAA (Figure 6.10) [93].  Another study proposed an 
iterative signal processing algorithm to recover the output of a 64-point analogue FFT 
[125]. Neural Networks can also be applied to assist the detection of the received 
symbols.    
 133 
 
 
Figure 6-10: Performance comparison of a 4-ponit analogue DFT implemented on a FPAA [93] 
6.8 Summary 
In this chapter, it is discussed that CMOS device matching depends on the bias point. 
For typical bias points, the threshold voltage is the dominant source of mismatch. 
Mismatch also depends on the device area and technology. Circuit designers can take 
these relations into account to optimize matching. Nevertheless, the bandwidth-
accuracy-power trade-off of the system is only determined by the technology 
parameters; hence, the circuit designer has no influence on the overall trade-off. 
Mismatch analysis results of the recursive DFT processor indicate that increasing the 
transform length degrades the performance. Also, the average dynamic range of the 
recursive DFT processor cannot meet the minimum requirement for the QPSK signal. 
The root cause analysis revealed that multipliers’ device mismatches make the most 
contribution to the dynamic range reduction. Increasing the power consumption of the 
multiplier can improve its accuracy. Moreover, error correction techniques such as TPC 
can mitigate the mismatch loss.   
 134 
 
Chapter 7   
CONCLUSION AND FUTURE WORK 
Since analogue DFT processors consume significantly less power than digital DFT 
processors, they have been nominated as the next generation of the DFT processors. 
This work was motivated by the goal of evolving the next generation of the DFT 
processors. In view of that, a power-scalable variable-length analogue DFT processor 
has been proposed. The proposed DFT processor has application in multi-standard 
OFDM transceivers. This chapter presents the contributions to knowledge, the 
concluding remarks, and the remaining work for the future.  
7.1 Contributions to Knowledge 
7.1.1 Methodology 
Since the classical DFT architectures (i.e. FIR DFT and FFT) were originally designed 
for discrete-time signal processing, they do not take advantage of analogue signals. 
Specifically, these architectures require an analogue decimation filter ahead of them. 
Moreover, the analogue implementations of the classical DFT architectures are not 
power-scalable. Hence, the real-time recursive DFT architecture has been proposed. In 
this architecture, the DFT coefficients are formed into piecewise continuous signals. 
Thereby, the continuous baseband signal is piecewise weighted by the DFT coefficients.  
 135 
 
Since the proposed architecture performs multiplications serially, it does not require 
additional multipliers to compute DFT of a longer sequence. Moreover, the power 
consumption is scalable with the transform length. Hence, the real-time recursive DFT 
architecture is suitable for the power-scalable variable-length DFT processor.  
In the classical DFT architectures, each multiplier provides the real multiplication of a 
signal sample and a DFT coefficient. On the other hand, in the real-time recursive DFT 
architecture, each multiplier provides the element-wise multiplication of a one-
dimensional array of the DFT coefficients and the continuous baseband signal. Hence, 
comparing to the classical architectures, the proposed architecture has the lowest 
number of multipliers. Moreover, since multiplications are performed without sampling, 
the analogue decimation filter is eliminated. Besides, the proposed architecture avoids 
propagation of the computational error to all DFTs by computing each DFT 
independently. 
7.1.2 Limitations and Considerations 
Reducing the dynamic range requirement of the ADC by moving the DFT processor 
from the digital back-end to the analogue front-end is at the cost of increasing the 
dynamic range requirement of the DFT processor.  
As data rate increases, the minimum SNDR and the minimum dynamic range 
requirements increase. On the other hand, as SNDR increases, width of the SNDR curve 
decreases. Therefore, as data rate increases, yield of the analogue DFT processor 
decreases. Results of the circuit-level performance analysis indicate that the 8-point 
recursive DFT processor has a yield of 97.5% for the BPSK modulated signal. For the 
QPSK modulated signal, however, yield of the 8-point recursive DFT processor is 
8.9%. Hence, dynamic range of the recursive DFT processor must be increased.  
 
 136 
 
In the absence of mismatch increasing the transform length does not affect the 
performance of the recursive DFT processor. However, in the presence of mismatch, 
doubling the transform length reduces the average dynamic range by 3dB. The 16-point 
recursive DFT processor has a yield of 43.4% for the BPSK modulated signal.   
As the DFT length increases, more samples should be stored on the integrating capacitor 
(𝐶𝐼 ). Thus, the required 𝐶𝐼  for the WiMAX standard becomes prohibitively large. 
Hence, the DFT sum was broken into partial sums (equation (5.64)). The results of 
partial sums can be added together in the Digital Signal Processor (DSP). The maximum 
length of the partial sum (8) was determined by finding the optimum value for 𝐶𝐼 (1pF). 
Sampling frequency of the SC integrator must be at least twice the signal frequency. 
Also, unity gain bandwidth of the op-amp in the SC integrator must be at least five 
times greater than the sampling frequency. Hence, unity gain bandwidth of the op-amp 
must be at least ten times greater than the signal frequency. In contrast, serial-to-parallel 
conversion in analogue FFT processors relaxes the bandwidth requirement of 
multipliers. While the analogue FFT processor was proposed for Ultra-Wideband 
OFDM wireless transceivers [57], the real-time recursive DFT processor is proposed for 
WiFi and WiMAX standards. The maximum channel bandwidth of WiFi and WiMAX 
standards is 20 MHz.   
Trade-offs in the design of the analogue circuits impose limitations on the performance 
of analogue DFT processors. The bandwidth-accuracy-power trade-off is only 
determined by the technology parameters and circuit designer has no influence on the 
overall trade-off. This thesis provides a proof-of-concept for the power-scalable 
variable-length analogue DFT processor. The real-time recursive DFT processor was 
designed in 180 nm CMOS technology. The design process and the results of the 
circuit-level performance analysis provide guidelines for future designers to select a 
technology that satisfies the performance requirements for another application.  
 
 137 
 
7.2 Future Work 
7.2.1 Design Enhancements 
Previous works on the analogue FFT processor were designed in 130 nm and 180 nm 
CMOS technologies [57-59]. In order to compare the performance of the proposed DFT 
processor with the analogue FFT processor, the real-time recursive DFT processor was 
designed in 180 nm CMOS technology. Equation 6.17 can be used to select a 
technology that provides higher accuracy while bandwidth and power meet the design 
specifications. 
Even though 𝐶𝐼  was selected carefully to prevent the reduction of dynamic range 
(section 4.4.3), the root cause analysis showed that integrator saturation reduces the 
dynamic range by 7dB. This problem can be resolved by reducing the input CM level of 
the Op-amp (𝑉𝑖𝑛,𝐶𝑀). Since input of the integrator was connected to the output of the 
multiplier, 𝑉𝑖𝑛,𝐶𝑀 was set equal to the output CM level of the multiplier. By adding a 
source follower between multiplier output and integrator input 𝑉𝑖𝑛,𝐶𝑀 can be shifted to a 
lower level. 
Performance comparison between the recursive DFT processor and the DT FFT 
processor [57] showed that dynamic range of the DT FFT is 20.7dB higher than the 
recursive DFT. The root cause analysis showed that multipliers’ device mismatches 
made the most contribution to the dynamic range reduction. Hence, the four-quadrant 
multiplier that was used in [57] is less sensitive to device mismatch than the designed 
Gilbert cell. Therefore, replacing the Gilbert cell multipliers by the multiplier in [57] 
can increase the dynamic range. 
 138 
 
7.2.2 Further Analysis 
Since the Process Design Kit (PDK) was not available, post-layout simulations were not 
performed. Nevertheless, since sampling frequency of the Switched-Capacitor integrator 
was below 100 MHz, results of the pre-layout simulations were reliable. For frequencies 
above 100 MHz, however, it is essential to extract the parasitic devices and perform the 
post-layout simulations.  
In order to investigate the effectiveness of different mismatch mitigation techniques, the 
trade-off in the offset cancellation techniques and the effectiveness of different error 
correction techniques must be analysed. A hybrid of electronic offset cancellation and 
error correction might resolve the problem. 
 
  
 139 
 
LIST OF REFERENCES  
1. Smaini, L., RF Analog Impairments Modeling for Communication Systems 
Simulation: Application to OFDM-based Transceivers. 2012: Wiley. 
2. Romeu, J. and A. Elias. Early proposals of wireless telegraphy in Spain: 
Francisco Salva Campillo (1751-1828). in Antennas and Propagation Society 
International Symposium, 2001. IEEE. 2001. 
3. Michaelis, A.R., From Semaphore to Satellite. 1965, Geneva  International 
Telecommunication Union. 
4. Ronalds, F., Descriptions of an Electrical Telegraph: And of Some Other 
Electrical Apparatus. 1823: R. Hunter. 
5. Makhrovskiy, O.V. 180 Years of telecommunication in Russia. in HISTory of 
ELectro-technology CONference (HISTELCON), 2012 Third IEEE. 2012. 
6. Haykin, S.S., Communication systems. 2001: Wiley. 
7. Klooster, J.W., Icons of Invention: The Makers of the Modern World from 
Gutenberg to Gates. 2009: Greenwood Press. 
8. Bourseul, C., Transmission électrique de la parole. L'Illustration, 1854. 
9. Pizer, R.A., The Tangled Web of Patent #174465. 2009: AuthorHouse. 
 140 
 
10. Evenson, A.E., The Telephone Patent Conspiracy of 1876: The Elisha Gray-
Alexander Bell Controversy and Its Many Players. 2000: McFarland. 
11. Coe, L., The Telephone and Its Several Inventors: A History. 2006: McFarland 
& Company. 
12. Beauchamp, C., Invented by Law: Alexander Graham Bell and the Patent That 
Changed America. 2015: Harvard University Press. 
13. Braun, K.F., Electrical oscillations and wireless telegraphy. 1909, [Nobel 
Lecture]. 
14. Hong, S., Wireless: From Marconi's Black-box to the Audion. 2001: MIT Press. 
15. Sarkar, T.K., et al., History of Wireless. 2006: Wiley. 
16. Mowbray, J.H., Sinking of the Titanic: Eyewitness Accounts. 2012: Dover 
Publications. 
17. Couch, L.W., Digital & Analog Communication Systems. 2012: Pearson 
Education. 
18. Harley, R.A., Electric signaling system. 1942, Google Patents. 
19. Kester, W.A. and i. Analog Devices, Data Conversion Handbook. 2005: 
Elsevier. 
20. Shannon, C.E., A symbolic analysis of relay and switching circuits. Transactions 
of the American Institute of Electrical Engineers, 1938. 57(12): p. 713-723. 
 141 
 
21. Shannon, C.E., A mathematical theory of communication. The Bell System 
Technical Journal, 1948. 27(3): p. 379-423. 
22. "The Nobel Prize in Physics 1956"  [Online]: Nobelprize.org. [Accessed 15 Sep 
2016] 
23. Kilby, J.S., Miniaturized electronic circuits. 1964, Google Patents. 
24. Noyce, R.N., Semiconductor device-and-lead structure. 1961, Google Patents. 
25. Cooley, J.W. and J.W. Tukey, An algorithm for the machine calculation of 
complex Fourier series. Mathematics of computation, 1965. 19(90): p. 297-301. 
26. Chang, R.W., Synthesis of band-limited orthogonal signals for multichannel 
data transmission. The Bell System Technical Journal, 1966. 45(10): p. 1775-
1796. 
27. Chang, R.W., Orthogonal frequency multiplex data transmission system. 1970, 
Google Patents. 
28. Weinstein, S. and P. Ebert, Data Transmission by Frequency-Division 
Multiplexing Using the Discrete Fourier Transform. IEEE Transactions on 
Communication Technology, 1971. 19(5): p. 628-634. 
29. Cooper, M., et al., Radio telephone system. 1975, Google Patents. 
30. Luo, F.L., Digital Front-End in Wireless Communications and Broadcasting: 
Circuits and Signal Processing. 2011: Cambridge University Press. 
 142 
 
31. Gleason, A.W., Mobile Technologies for Every Library. 2015: Rowman & 
Littlefield Publishers. 
32. [Online]:http://www.fpa.es/multimedia-en/photo-galleries/press-conference-
with-martin-cooper.html.[Accessed 15 Sep 2016] 
33. IEEE Standard for Telecommunications and Information Exchange Between 
Systems - LAN/MAN Specific Requirements - Part 11: Wireless Medium Access 
Control (MAC) and physical layer (PHY) specifications: High Speed Physical 
Layer in the 5 GHz band. IEEE Std 802.11a-1999, 1999: p. 1-102. 
34. IEEE Standard for Local and Metropolitan Area Networks Part 16: Air 
Interface for Fixed and Mobile Broadband Wireless Access Systems Amendment 
2: Physical and Medium Access Control Layers for Combined Fixed and Mobile 
Operation in Licensed Bands and Corrigendum 1. IEEE Std 802.16e-2005 and 
IEEE Std 802.16-2004/Cor 1-2005 (Amendment and Corrigendum to IEEE Std 
802.16-2004), 2006: p. 0_1-822. 
35. Bagheri, R., et al., An 800-MHz-6-GHz Software-Defined Wireless Receiver in 
90-nm CMOS. IEEE Journal of Solid-State Circuits, 2006. 41(12): p. 2860-2876. 
36. Ru, Z., et al., Digitally Enhanced Software-Defined Radio Receiver Robust to 
Out-of-Band Interference. IEEE Journal of Solid-State Circuits, 2009. 44(12): p. 
3359-3375. 
 143 
 
37. Mitola, J., The software radio architecture. IEEE Communications Magazine, 
1995. 33(5): p. 26-38. 
38. Abidi, A.A., The Path to the Software-Defined Radio Receiver. IEEE Journal of 
Solid-State Circuits, 2007. 42(5): p. 954-966. 
39. Walden, R.H., Analog-to-digital converter survey and analysis. IEEE Journal on 
Selected Areas in Communications, 1999. 17(4): p. 539-550. 
40. Tuttlebee, W.H.W., Software Defined Radio: Enabling Technologies. 2003: 
Wiley. 
41. Lehne, M., An Analog/Mixed Signal FFT Processor for Ultra-Wideband OFDM 
Wireless Transceivers. 2008, Virginia Polytechnic Institute and State University. 
42. Yang, S.C., OFDMA System Analysis and Design. 2010: Artech House. 
43. Cho, Y.S., et al., MIMO-OFDM Wireless Communications with MATLAB. 2010: 
Wiley. 
44. Oppenheim, A.V. and R.W. Schafer, Discrete-Time Signal Processing. 2011: 
Pearson Education. 
45. Prasad, R., OFDM for Wireless Communications Systems. 2004: Artech House. 
46. Peled, A. and A. Ruiz. Frequency domain data transmission using reduced 
computational complexity algorithms. in Acoustics, Speech, and Signal 
Processing, IEEE International Conference on ICASSP '80. 1980. 
 144 
 
47. Prasad, R. and F.J. Velez, WiMAX Networks: Techno-Economic Vision and 
Challenges. 2010: Springer Netherlands. 
48. Korowajczuk, L., LTE, WiMAX and WLAN Network Design, Optimization and 
Performance Analysis. 2011: Wiley. 
49. IEEE Standard for Information Technology- Telecommunications and 
Information Exchange Between Systems- Local and Metropolitan Area 
Networks- Specific Requirements Part Ii: Wireless LAN Medium Access Control 
(MAC) and Physical Layer (PHY) Specifications. IEEE Std 802.11g-2003 
(Amendment to IEEE Std 802.11, 1999 Edn. (Reaff 2003) as amended by IEEE 
Stds 802.11a-1999, 802.11b-1999, 802.11b-1999/Cor 1-2001, and 802.11d-
2001), 2003: p. i-67. 
50. Nuaymi, P.L., WiMAX: Technology for Broadband Wireless Access. 2007: John 
Wiley & Sons. 
51. Kuo, J.-C., et al., VLSI design of a variable-length FFT/IFFT processor for 
OFDM-based communication systems. EURASIP J. Appl. Signal Process., 2003. 
2003: p. 1306-1316. 
52. Chun-Lung, H., L. Syu-Siang, and S. Muh-Tian. A low power and variable-
length FFT processor design for flexible MIMO OFDM systems. in Circuits and 
Systems, 2009. ISCAS 2009. IEEE International Symposium on. 2009. 
 145 
 
53. Lin, Y.T., P.Y. Tsai, and T.D. Chiueh, Low-power variable-length fast Fourier 
transform processor. Computers and Digital Techniques, IEE Proceedings -, 
2005. 152(4): p. 499-506. 
54. Song-Nien, T., L. Chi-Hsiang, and C. Tsin-Yuan, An Area- and Energy-Efficient 
Multimode FFT Processor for WPAN/WLAN/WMAN Systems. Solid-State 
Circuits, IEEE Journal of, 2012. 47(6): p. 1419-1435. 
55. Guichang, Z., X. Fan, and A.N. Willson, Jr., A power-scalable reconfigurable 
FFT/IFFT IC based on a multi-processor ring. Solid-State Circuits, IEEE 
Journal of, 2006. 41(2): p. 483-495. 
56. Uyttenhove, K. and M.S.J. Steyaert, Speed-power-accuracy tradeoff in high-
speed CMOS ADCs. Circuits and Systems II: Analog and Digital Signal 
Processing, IEEE Transactions on, 2002. 49(4): p. 280-287. 
57. Lehne, M. and S. Raman, A 0.13-µm 1-GS/s CMOS Discrete-Time FFT 
Processor for Ultra-Wideband OFDM Wireless Receivers. Microwave Theory 
and Techniques, IEEE Transactions on, 2011. 59(6): p. 1639-1650. 
58. Sadhu, B., et al. A 5GS/s 12.2pJ/conv. analog charge-domain FFT for a 
software defined radio receiver front-end in 65nm CMOS. in Radio Frequency 
Integrated Circuits Symposium (RFIC), 2012 IEEE. 2012. 
 146 
 
59. Sadeghi, N., V.C. Gaudet, and C. Schlegel, Analog DFT Processors for OFDM 
Receivers: Circuit Mismatch and System Performance Analysis. Circuits and 
Systems I: Regular Papers, IEEE Transactions on, 2009. 56(9): p. 2123-2131. 
60. Mano, M.M. and P. Spasov, Digital Design. 2002: Prentice Hall. 
61. Widrow, B. and I. Kollár, Quantization Noise: Roundoff Error in Digital 
Computation, Signal Processing, Control, and Communications. 2008: 
Cambridge University Press. 
62. Sarpeshkar, R., Analog versus digital: extrapolating from electronics to 
neurobiology. Neural computation, 1998. 10(7): p. 1601-1638. 
63. Hosticka, B.J., Performance comparison of analog and digital circuits. 
Proceedings of the IEEE, 1985. 73(1): p. 25-29. 
64. Roberts, M.J., Signals and Systems: Analysis Using Transform Methods and 
MATLAB. 2004: McGraw-Hill. 
65. Reddy, N. and M.N.S. Swamy, Switched-capacitor realization of a discrete 
Fourier transformer. Circuits and Systems, IEEE Transactions on, 1983. 30(4): 
p. 254-255. 
66. Ogihara, A., S. Yamashita, and S. Yoneda. A pitch synchronous switched 
capacitor discrete Fourier transform circuit. in Circuits and Systems, 1991., 
IEEE International Sympoisum on. 1991. 
 147 
 
67. Rao, K.R., D.N. Kim, and J.J. Hwang, Fast Fourier Transform - Algorithms and 
Applications. 2011: Springer Netherlands. 
68. Oppenheim, A.V., A.S. Willsky, and S.H. Nawab, Signals and Systems. 1997: 
Prentice Hall. 
69. Ismail, M. and T. Fiez, Analog VLSI: Signal and Information Processing. 1994: 
McGraw-Hill. 
70. Boyle, K., et al. Design and implementation of an all-analog fast-fourier 
transform processor. in Circuits and Systems, 2007. MWSCAS 2007. 50th 
Midwest Symposium on. 2007. 
71. Lehne, M. and S. Raman. A prototype analog/mixed-signal fast fourier 
transform processor IC for OFDM receivers. in Radio and Wireless Symposium, 
2008 IEEE. 2008. 
72. Rivet, F., et al., A Disruptive Receiver Architecture Dedicated to Software-
Defined Radio. Circuits and Systems II: Express Briefs, IEEE Transactions on, 
2008. 55(4): p. 344-348. 
73. Sadhu, B., Circuit techniques for cognitive radio receiver front-ends. 2012, 
University of Minnesota  
74. Goertzel, G., An Algorithm for the Evaluation of Finite Trigonometric Series. 
American Mathematical Monthly, 1958. 65(1): p. 34-35. 
 148 
 
75. Lindfors, S., A. Parssinen, and K.A.I. Halonen, A 3-V 230-MHz CMOS 
decimation subsampler. IEEE Transactions on Circuits and Systems II: Analog 
and Digital Signal Processing, 2003. 50(3): p. 105-117. 
76. Chiueh, T.-D. and P.-Y. Tsai, OFDM Baseband Receiver Design for Wireless 
Communications. 2007: Wiley Publishing. 352. 
77. Lehne, M. and S. Raman, A Discrete-Time FFT Processor for Ultrawideband 
OFDM Wireless Transceivers: Architecture and Behavioral Modeling. Circuits 
and Systems I: Regular Papers, IEEE Transactions on, 2010. 57(11): p. 3011-
3022. 
78. Zare-Hoseini, H., I. Kale, and O. Shoaei, Modeling of switched-capacitor delta-
sigma Modulators in SIMULINK. IEEE Transactions on Instrumentation and 
Measurement, 2005. 54(4): p. 1646-1654. 
79. Guichang, Z., X. Fan, and A.N. Willson, A power-scalable reconfigurable 
FFT/IFFT IC based on a multi-processor ring. IEEE Journal of Solid-State 
Circuits, 2006. 41(2): p. 483-495. 
80. Razavi, B., Design of Analog CMOS Integrated Circuits. 2001: McGraw-Hill. 
81. Jaffari, J., Statistical yield analysis and design for nanometer VLSI. 2010, 
University of Waterloo. 
82. Ben, Y., Statistical Verification and Optimization of Integrated Circuits. 2011, 
University of California, Berkeley. 
 149 
 
83. Maly, W., A.J. Strojwas, and S.W. Director, VLSI Yield Prediction and 
Estimation: A Unified Framework. Computer-Aided Design of Integrated 
Circuits and Systems, IEEE Transactions on, 1986. 5(1): p. 114-130. 
84. Glynn, P., The Central Limit Theorem, Law of Large Numbers and Monte Carlo 
Methods. Stanford University. 
85. Davison, A.C., Statistical Models. 2003: Cambridge University Press. 
86. https://www.mosis.com/. 
87. Sadeghi, N., Analog FFT Interface for Ultra-Low Power Analog Receiver 
Architectures. 2007, University of Alberta. 
88. Sadhu, B., et al., Analysis and Design of a 5 GS/s Analog Charge-Domain FFT 
for an SDR Front-End in 65 nm CMOS. Solid-State Circuits, IEEE Journal of, 
2013. 48(5): p. 1199-1211. 
89. Sturm, M., Passive switched-capacitor based filter design, optimization, and 
calibration for sensing applications. 2013, University of Minnesota  
90. Rivet, F., Contribution à l’étude et à la réalisation d’un frontal radiofréquence 
analogique en temps discrets pour la radio-logicielle intégrale. 2009, University 
of Bordeaux  
91. Rivet, F., et al., The Experimental Demonstration of a SASP-Based Full 
Software Radio Receiver. Solid-State Circuits, IEEE Journal of, 2010. 45(5): p. 
979-988. 
 150 
 
92. Suh, S., Low-power discrete Fourier transform and soft-decision Viterbi 
decoder for OFDM receivers. 2011, Georgia Institute of Technology. 
93. Sangwook, S., et al., Low-Power Discrete Fourier Transform for OFDM: A 
Programmable Analog Approach. Circuits and Systems I: Regular Papers, IEEE 
Transactions on, 2011. 58(2): p. 290-298. 
94. Gunhee, H. and E. Sanchez-Sinencio, CMOS transconductance multipliers: a 
tutorial. Circuits and Systems II: Analog and Digital Signal Processing, IEEE 
Transactions on, 1998. 45(12): p. 1550-1563. 
95. Toumazou, C., F.J. Lidgey, and D. Haigh, Analogue IC design: the current-
mode approach. 1990: Peregrinus on behalf of the Institution of Electrical 
Engineers. 
96. Razavi, B., RF Microelectronics. 2012: Prentice Hall. 
97. Barrie, G., A precise four-quadrant multiplier with subnanosecond response. 
Solid-State Circuits, IEEE Journal of, 1968. 3(4): p. 365-373. 
98. Rogers, J.W.M. and C. Plett, Radio Frequency Integrated Circuit Design. 2014: 
Artech House, Incorporated. 
99. Babanezhad, J.N. and G.C. Temes, A 20-V four-quadrant CMOS analog 
multiplier. Solid-State Circuits, IEEE Journal of, 1985. 20(6): p. 1158-1168. 
 151 
 
100. Ko-Chi, K. and A. Leuciuc, A linear MOS transconductor using source 
degeneration and adaptive biasing. Circuits and Systems II: Analog and Digital 
Signal Processing, IEEE Transactions on, 2001. 48(10): p. 937-943. 
101. Toumazou, C., G.S. Moschytz, and B. Gilbert, Trade-Offs in Analog Circuit 
Design: The Designer's Companion. 2007: Springer US. 
102. Barrie, G., The multi-tanh principle: a tutorial overview. Solid-State Circuits, 
IEEE Journal of, 1998. 33(1): p. 2-17. 
103. Ryan, A.P. and O. McCarthy, A novel pole-zero compensation scheme using 
unbalanced differential pairs. Circuits and Systems I: Regular Papers, IEEE 
Transactions on, 2004. 51(2): p. 309-318. 
104. Soo, D.C. and R.G. Meyer, A four-quadrant NMOS analog multiplier. Solid-
State Circuits, IEEE Journal of, 1982. 17(6): p. 1174-1178. 
105. Shen-Iuan, L. and H. Yuh-Shyan, CMOS four-quadrant multiplier using bias 
feedback techniques. Solid-State Circuits, IEEE Journal of, 1994. 29(6): p. 750-
752. 
106. Hosticka, B.J., R.W. Brodersen, and P.R. Gray, MOS sampled data recursive 
filters using switched capacitor integrators. Solid-State Circuits, IEEE Journal 
of, 1977. 12(6): p. 600-608. 
107. Martin, K., Improved circuits for the realization of switched-capacitor filters. 
Circuits and Systems, IEEE Transactions on, 1980. 27(4): p. 237-244. 
 152 
 
108. Carusone, T.C., D. Johns, and K. Martin, Analog Integrated Circuit Design. 
2011: Wiley. 
109. Whitaker, J.C., The Electronics Handbook, Second Edition. 2005: CRC Press. 
110. Gray, P.R., Analysis and Design of Analog Integrated Circuits. 2009: John 
Wiley & Sons. 
111. Caves, J.T., et al., Sampled analog filtering using switched capacitors as resistor 
equivalents. Solid-State Circuits, IEEE Journal of, 1977. 12(6): p. 592-599. 
112. Brodersen, R.W., P.R. Gray, and D. Hodges, MOS switched-capacitor filters. 
Proceedings of the IEEE, 1979. 67(1): p. 61-75. 
113. Malcovati, P., et al., Behavioral modeling of switched-capacitor sigma-delta 
modulators. IEEE Transactions on Circuits and Systems I: Fundamental Theory 
and Applications, 2003. 50(3): p. 352-364. 
114. Castello, R. and P.R. Gray, Performance limitations in switched- capacitor 
filters. Circuits and Systems, IEEE Transactions on, 1985. 32(9): p. 865-876. 
115. Allen, P.E., CMOS Analog IC Design Course notes Georgia Institute of 
Technology, 2005. 
116. Enz, C. and Y. Cheng, MOS transistor modeling for RF IC design. IEEE Journal 
of Solid-State Circuits, 2000. 35(2): p. 186-201. 
 153 
 
117. Pelgrom, M.J.M., A.C.J. Duinmaijer, and A.P.G. Welbers, Matching properties 
of MOS transistors. Solid-State Circuits, IEEE Journal of, 1989. 24(5): p. 1433-
1439. 
118. Drennan, P.G. and C.C. McAndrew, Understanding MOSFET mismatch for 
analog design. Solid-State Circuits, IEEE Journal of, 2003. 38(3): p. 450-456. 
119. Drennan, P.G. and C.C. McAndrew. A comprehensive MOSFET mismatch 
model. in Electron Devices Meeting, 1999. IEDM '99. Technical Digest. 
International. 1999. 
120. Lakshmikumar, K.R., R.A. Hadaway, and M.A. Copeland, Characterisation and 
modeling of mismatch in MOS transistors for precision analog design. Solid-
State Circuits, IEEE Journal of, 1986. 21(6): p. 1057-1066. 
121. Kinget, P.R., Device mismatch and tradeoffs in the design of analog circuits. 
Solid-State Circuits, IEEE Journal of, 2005. 40(6): p. 1212-1224. 
122. Lovett, S.J., et al., Optimizing MOS transistor mismatch. Solid-State Circuits, 
IEEE Journal of, 1998. 33(1): p. 147-150. 
123. Mizuno, T., J. Okumtura, and A. Toriumi, Experimental study of threshold 
voltage fluctuation due to statistical variation of channel dopant number in 
MOSFET's. Electron Devices, IEEE Transactions on, 1994. 41(11): p. 2216-
2221. 
 154 
 
124. Ta-Hsun, Y., et al. Mis-match characterization of 1.8 V and 3.3 V devices in 
0.18 µm mixed signal CMOS technology. in Microelectronic Test Structures, 
2001. ICMTS 2001. Proceedings of the 2001 International Conference on. 2001. 
125. Fouque, A., et al. A low power digitally-enhanced SASP-based receiver 
architecture for mobile DVB-S applications in the Ku-band (10.7-12.75 GHz). in 
Radio and Wireless Symposium (RWS), 2011 IEEE. 2011. 
 
  





















