CIRCUIT DESIGN TECHNIQUES FOR WIDEBAND PHASED ARRAYS

A DISSERTATION
SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL
OF THE UNIVERSITY OF MINNESOTA
BY

SACHIN KALIA

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
Doctor of Philosophy

PROF. RAMESH HARJANI

JUNE, 2015
Acknowledgements

I would like to begin by thanking Professor Harjani, for giving me this opportunity to pursue a PhD degree in the field of my liking. I would like to thank him for being supportive of me and helping me find my feet in this domain. I would like to thank him for being patient with me during my periods of failures and for his encouragement during the same. I would also like to thank Savita Mam for her warmth, kindness and hospitality. I will surely miss the annual get-togethers and the wonderful dinner. On the flip side, ghee would no longer be a concern in dishes.

I would also like to thank Professor Anand Gopinath, Professor Chris Kim and Professor Alena Talkachova for being a part of my defense committee. I thank you for your time.

I have been fortunate enough to have shared my stay in Minneapolis with some wonderful people at different points in time.

Within the lab, I would like to thank Satwik, first and foremost, for his chip design efforts and ideas in the implementation of the spatio-spectral phased array scheme. I would also like to thank him for his help, guidance and useful technical feedback and discussions. I still remember that last phone conversation I had with him before I chose UMN over UCLA and TI, Bangalore!

I would also like to thank Narsimha Lanka for his valuable advice during the couple of occasions I have met him. I would like to thank Mohammad (Elba), for his friendship, help, guidance, for the numerous quality technical and non-technical discussions and...
above all for his chip design help and feedback. I have always had his ears for any and every idea of mine. I sincerely thank him for that.

I would like to thank Ashu for being a wonderful friend and labmate. I would like to thank him for his availability and willingness to help during any and every hard time I have had in the lab and there have been many.

I would like to thank Sudhir and Bodhi for all their help during my initial years in the lab, be it with regards to settling in, or academics or tape outs or testing and the list goes on. I would also like to thank Sudhir for being partly responsible for my current interest in politics!

I would like to thank, Jaehyup and Tachyoun for being so nice to me. I would like to thank Jaehyup for his help during my internship hunt. I would also like to thank Martin for his valuable inputs, help and suggestions during testing. He is indeed the true \textit{guru} of the test lab.

I would like to thank Rakesh for being a great labmate and for all his help during my tapeouts. I have never come across, neither do I think I ever will, a guy who lives, breathes and eats circuits!

I would like to thank Saurabh for being a wonderful sport and a very patient junior. He has been of immense help to me during this last year or so and I sincerely thank him for that and hope that the help continues (smiley needed!). I also hope he discovers this invention called \textit{razor} for his own betterment!

I would like to thank Kang for introducing me through google, thankfully, to all kinds of exotic foods, none that I would ever approve of. I hope you adopt green! I would also like to thank Xingyi for being a great junior and supporting the right soccer team!

Outside the lab, I would like to thank Abhijit for occasionally reviving the poet in me. After many a trials have I learnt, some bridges are better burnt!

I would like to thank Shailabh and Swayambhoo, my on and off house mates over the past 5 years for their great friendship. I am grateful to Swayambhoo for reviving the Maths lover in me and to Shailabh for making every success of United and Mourinho
taste sweeter. I would like to thank Swayambhoo for my current political inclinations and for introducing me to social news media.

I would like to thank my internship manager Brad Kramer and my senior Swaminathan Sankaran, from Kilby Labs, Texas Instruments, Dallas, TX. It was my great fortune to have worked with and learnt from you guys during my internship. It was truly a pleasure to work alongside some of the smartest and most humble minds that I have ever come across.

Outside of this professional and friend circle, I would like to thank Carlos and Chimai for their valuable efforts in keeping up the maintenance of, installing new and resolving swiftly any issues associated with, our lab’s machines and servers. I would like to thank Dan from ECE depot for being so helpful in placing orders and ensuring speedy shipping of the same. I would also like to thank Linda for all her help related to administrative and graduation matters.

I would like to thank my brother for suggesting EE as a possible alternative to ECE. Back then I was not so sure but now I am certain that you were wrong! Last but by no means the least, I would like to thank my Mother without whose efforts and sacrifice, a simple school education might have proven difficult let alone IIT and what followed. There are no words that can do justice to your praise!

I sincerely thank you all and wish you the best for all your future endeavors.
To My Mother
Abstract

This dissertation focuses on beam steering in wideband phased arrays and phase noise modeling in injection locked oscillators. Two different solutions, one in frequency and one in time, have been proposed to minimize beam squinting in phased arrays. Additionally, a differential current reuse frequency doubler for area and power savings has been proposed. Silicon measurement results are provided for the frequency domain solution (IBM 65nm RF CMOS), injection locked oscillator model verification (IBM 130nm RF-CMOS) and frequency doubler (IBM 65nm RF CMOS), while post extraction simulation results are provided for the time domain phased array solution (the chip is currently under fabrication, TSMC 65nm RF CMOS).

In the frequency domain solution, a 4-point passive analog FFT based frequency tunable filter is used to channelize an incoming wideband signal into multiple narrowband signals, which are then processed through independent phase shifters. A two channel prototype has been developed at 8GHz RF frequency. Three discrete phase shifts $\theta \in (-90^\circ, 0^\circ, 90^\circ)$ are implemented through differential I-Q swapping with appropriate polarity. A minimum null-depth of 19dB while a maximum null-depth of 27dB is measured.

In the time domain solution, a discrete time approach is undertaken with signals getting sampled in order of their arrival times. A two-channel prototype for a 2GHz instantaneous RF bandwidth (7GHz-9GHz) has been designed. A QVCO generates quadrature LO signals at 8GHz which are phase shifted through a 5-bit (2 extra bits from differential I-Q swapping with appropriate polarity) cartesian combiner. Baseband sampling clocks are generated from phase shifted LOs through a CMOS divide by 4 with independent resets. The design achieves an average time delay of 4.53ps with 31.5mW of power consumption (per channel, buffers excluded).
An injection locked oscillator has been analyzed in s-domain using Paciorek’s time domain transient equations. The simplified analysis leads to a phase noise model identical to that of a type-I PLL. The model is equally applicable to injection locked dividers and multipliers and has been extended to cover all injection locking scenarios. The model has been verified against a discrete 57MHz Colpitt’s ILO, a 6.5GHz ILFD and a 24GHz ILFM with excellent matching between the model and measurements.

Additionally, a differential current reuse frequency doubler, for frequency outputs between 7GHz to 14GHz, design has been developed to reduce passive area and dc power dissipation. A 3-bit capacitive tuning along with a tail current source is used to better conversion efficiency. The doubler shows FOM$_T$ values between 191dBc/Hz to 209dBc/Hz when driven by a 0.7GHz to 5.8GHz wide tuning VCO with a phase noise that ranges from -114dBc/Hz to -112dBc/Hz over the same bandwidth.
Contents

Acknowledgements i

Abstract v

List of Tables xi

List of Figures xii

1 Introduction 1

1.1 Data Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Spatial Multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Wideband Phased Array Challenges . . . . . . . . . . . . . . . . . . . 4
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Antenna Arrays 7

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Antenna Arrays Applications . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Phased Arrays and Timed Arrays . . . . . . . . . . . . . . . . . . . . . . 16
   2.3.1 Phased Array: Architectures and Prior Art . . . . . . . . . . . . 17
   2.3.2 Timed Array: Architectures and Prior Art . . . . . . . . . . . . . 31
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
List of Tables

3.1 Architectural choice for spatio-spectral beamformer . . . . . . . . . . . . 41
3.2 A summary of comparison with other related works. . . . . . . . . . . . 60
3.3 Summary of performance . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.1 Extracted phase and delay values for different (I,Q) codes . . . . . . . . 78
4.2 Performance summary (extracted) for the two channel timed array receiver 88
# List of Figures

1.1 Wireless application and data rate trends .......................... 2
1.2 An illustration of channel capacity vs number of antennas in a MIMO system .................................................. 4
1.3 An N-element phased array receiver concept ......................... 4
1.4 Early RF phase shifters and modern phased arrays feature size .... 5
2.1 Radiation (reception) patterns for isotropic/anisotropic antennas and antenna array .............................................. 8
2.2 Beam formation in an N-element array .................................. 9
2.3 Array factor for different antenna spacings for a 4-element array .... 11
2.4 Array factor for different array sizes for \( d = \frac{\lambda}{2} \) .................... 11
2.5 A 4x4 2-D array .................................................................. 12
2.6 A MIMO system with M transmit and N receive antennas ......... 14
2.7 Different application spaces for phased arrays ......................... 15
2.8 Phased array architectures: (a) RF (b) LO (c) IF/Baseband ......... 17
2.9 RF phase shift choices (a) variable lumped LC T-line (b) variable MEMS T-line \[1\] (c) a 4x4 butler matrix (d) an RTPS \[2\] ......................... 18
2.10 Generalized representation of a vector interpolator and a quadrature all-pass filter (QAF) ..................................................... 20
2.11 Architectural choices for vector interpolator ......................... 23
2.12 Fixed and variable phase N-stage ring vco ............................. 25
2.13 Chain of injection locked oscillators as phase shifters ................. 26
2.14 Different options of realizing sine / cosine weighing in discrete time domain ... 28
2.15 A \( g_m \)-C based active delay stage [3] .......................... 30
3.1 Different beamforming scenarios for a spatio-spectral architecture (Each color denotes a different frequency slice and can be steered independent of the others. Only the main lobes are shown.) ................. 35
3.2 Phase error vs. frequency for a phased-array system with a fractional bandwidth of 0.5 and 16 frequency slices .................. 36
3.3 Architectural choices for spatio-spectral beamforming (Each color stands for a different frequency) .................. 39
3.4 An 8 point FFT’s output bin frequency response for a 2GS/s sampling rate (Only 3 bins shown) .................. 42
3.5 A spatio-spectral beamformer (2-channel) with an FFT (4-point) as the spectral filter (Each color stands for a different frequency slice and corresponding phase shifter.) .................. 43
3.6 Two limiting cases for spectral filtering .................. 44
3.7 SNR and SINR for different blocker directions and frequencies. ..... 45
3.8 A 2 channel 4-point FFT based spectral filter followed by a dedicated 4-point FFT based phase shifter implementation .................. 48
3.9 Beamforming results (circuit simulations) for one bin for a 2-channel phased array with a 4-point FFT as the spectral and the spatial filter ................. 49
3.10 Signal processing chain (Both spectral and spatial filtering are sinc in nature.) .................. 50
3.11 2-channel 4-frequency beamforming receiver .................. 51
3.12 A quadrature injection locked oscillator along with a poly-phase filter and a pulse slimming circuit .................. 53
3.13 Downconversion mixer .................. 54
3.14 A 4 point FFT and charge sharing operations .................. 56
3.15 Baseband: Analog FFT + Vector-Combiner ........................................... 57
3.16 Chip micrograph .................................................................................. 57
3.17 Beam-forming test setup ...................................................................... 58
3.18 Measured I and Q bin rotations for a beat frequency of 1KHz .......... 59
3.19 Beamforming results for four different test cases (a, b, c, d) .......... 61
4.1 Three candidate architectures for delay sampling implementation .... 68
4.2 A 2-channel delayed sampling timed array receiver ......................... 71
4.3 Pseudo-differential NC LNA with output buffers .............................. 72
4.4 NF, gain and impedance match of the LNA and buffer ................. 74
4.5 A 3-bit cartesian combiner with $g_m$ switching ............................... 75
4.6 Unit $g_m$ cell for the cartesian combiner with switches .................. 76
4.7 CML and CMOS buffer stage for cartesian combiners outputs ....... 77
4.8 A complementary QVCO ................................................................... 79
4.9 Mixer .................................................................................................... 80
4.10 Post extraction simulation gain, NF and linearity of the mixer ......... 81
4.11 A CMOS $\div$ by 4 with a delayed reset signal ................................. 81
4.12 Baseband $g_m$ cells with write, read and reset clocks .................... 82
4.13 Buffered baseband outputs for two different baseband frequencies .. 83
4.14 A synchronous CMOS $\div$ by 4 using two back to back D flip-flops .. 84
4.15 Sampling clock generation using current starved delay cells .......... 85
4.16 Layout of the two channel timed array receiver ............................. 86
4.17 An alternative, scalable, sampling clock generation scheme .......... 87
5.1 Uses of injection locking ................................................................. 90
5.2 ILO frequency domain model about the steady state phase, $\Phi_{ss}$ .... 93
5.3 Filtering bandwidth vs. lock range .................................................... 96
5.4 ILO noise shaping as a function of the source noise ....................... 96
5.5 Output phase noise profile for two different source noise profiles .... 97
5.6 Output phase noise profile for two different source noise profiles .... 98
5.7 AM-to-PM scaling factor in a complementary and a regular QVCO . . . 100
5.8 Discrete Colpitts based ILO operating at 57MHz . . . . . . . . . . . . . 102
5.9 Colpitts based ILO noise at different offsets across the lock range . . . . 103
5.10 A 0.13µm CMOS integrated ILFD operating at 6.5GHz . . . . . . . . . . . 104
5.11 ILFD’s noise at different offsets across the lock range . . . . . . . . . . . 105
5.12 VCO (ILFD) and injection source noise . . . . . . . . . . . . . . . . . . 105
5.13 Filtering action of the ILO (ILFD) . . . . . . . . . . . . . . . . . . . . . 106
5.14 A subharmonic ILO (ILFM) operating at 24GHz [4] . . . . . . . . . . . 106
5.15 Sub-harmonic ILO’s noise profile across the lock range . . . . . . . . . . . 107
6.1 Generation of Nth harmonic from an N-phase clock . . . . . . . . . . . . 111
6.2 Evolution of the differential architecture . . . . . . . . . . . . . . . . . . 112
6.3 A 7-14 GHz differential current reuse frequency doubler with 3-bit capac-
itive tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.4 Die micrograph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.5 Extracted conversion gain and LO suppression at band edge and center
frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.6 Extracted conversion gain and power consumption across the band . . 116
6.7 Measured phase noise at band center and edges . . . . . . . . . . . . . . 117
7.1 A conceptual diagram of two channel timed array receiver with baseband
rotation and delay sampling . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.8 A conceptual diagram of two channel timed array receiver with baseband
rotation and delay sampling . . . . . . . . . . . . . . . . . . . . . . . . . 120
A.1 A lossy LC tank with a $g_m$ cell and an injection source: conceptual diagram
Chapter 1

Introduction

Integrated circuits have transformed our lives and the world around us in more ways than was thought possible. They are ubiquitous in items ranging from luxury to necessity. The growth in semiconductor industry has kept pace with the trends forecasted by Moore’s law. The consistent lowering of IC fabrication technology node has allowed for increased functionality per unit area leading to a massive boom in the wireless device market. All modern communication devices are equipped with diverse functionalities such as audio, video, WiFi, camera, GPS, bluetooth, NFC etc. There is an ever increasing demand for higher data rates to improve the performance of these applications and enable new ones. This demand for high data rates along with provisions for storing and processing larger volumes of data is one of the principal driving factors behind current research in wireless and related domains. Fig. 1.1 shows trends, and data rate predictions, in wireless communication development both from an application space progression as well as data rate point of view.

1.1 Data Rates

The handling of data can be broadly classified into three different categories. The first category deals with storing of data and is a pre-transmission or post-acquisition issue.
The second category deals with processing of this voluminous data and is again a pre-transmission or post-acquisition issue. The third category deals with transmission and acquisition of this data, i.e. the transmitter and receiver in a radio. There is a limit to the amount of data that can be transferred over a given channel bandwidth (frequency spectrum) without incurring any error during data reception. This limit is given by the Shannon-Hartley theorem, \( C = B \log_2(1 + SNR) \), where \( C \) is the capacity of the channel (maximum data rate for error free communication), \( B \) is the bandwidth over which the signal is sent and \( SNR \) is the SNR of the information. The channel capacity can be increased by increasing the signal bandwidth and / or increasing the signal SNR.

Further, diversity techniques can be employed in frequency, time, code and space for increasing either the SNR or the bandwidth. Time diversity involves sending multiple instances of the same information which leads to redundancy. Coding diversity allows for simultaneous occupancy of larger bandwidths by multiple signals, each of which is differently encoded, and is implicitly included in schemes such as a CDMA. Frequency diversity requires increasing of the signal bandwidth and / or using different portions of a given spectrum differently. Frequency diversity has an additional advantage of combating frequency selective fading. A dense modulation scheme such as 64-QAM or 16-PSK etc. can be used to increase the channel capacity for a given bandwidth at
the cost of increased BER (bit error rate) and increased transceiver design complexity. Since frequency is a scarce resource, diversity in time adds redundancy and coding diversity (and/or denser modulation schemes) leads to limited capacity improvements, we must use diversity in space (spatial multiplexing) for further improvements.

1.2 Spatial Multiplexing

Spatial multiplexing is realized through the use of multiple transmit and receive antennas (multi-antenna systems). Intelligent signal processing in the digital back end, both at the transmitter and at the receiver, leads to a linear increase in channel capacity, over the same bandwidth, in such multi-antenna systems. Multiple-Input-Multiple-Output systems, or MIMO as they are popularly known, increase the channel capacity $N$-fold where $N$ is the smaller of the number of transmit and receive antennas, $N = \min(N_{TX}, N_{RX})$. Fig. 1.2 shows the linear improvement in capacity for a MIMO system. A phased array is the simplest possible realization of a multi-antenna system and only requires addition of phase aligned copies of signals from different antennas. They are widely used in radar systems for target scanning, weather monitoring, satellite imaging etc. Recently they have also been incorporated in mobile phones for 4G LTE and in wireless routers for IEEE 802.11n and 802.11ac standards, WPAN, auto radars, passive imagers etc. Even with this simple signal processing, phased arrays offer a $\sim \log_2(N)$ improvement in channel capacity by increasing the effective SNR $N$-fold. Fig. 1.3 shows the conceptual working of an N-element phased array where signals are combined after phase alignment.

\[ \text{log}_2(N) \]

Software defined radios or SDRs as they are popularly known have seen intensive research over the past many years and promise manifold increase in spectrum usage through intelligent spectrum sharing, however any commercial product realization is far from being complete.

The signal processing involves performing SVD (singular value decomposition) on the channel gain matrix through transmit precoding and receiver shaping.
1.3 Wideband Phased Array Challenges

Phased array systems have evolved over the decades with feature scaling, higher levels of integration and far more functionality packed into much smaller areas. Fig. 1.4 shows an early 1960’s implementation of an RF phase shifter, one of the many modules in a phased array transmitter (or receiver) and a modern 16 antenna integrated SiGe phased array transmitter. The difference in size is alarming even if we factor out the much higher frequency of operation of the modern chip.
Accurate beam steering over wide instantaneous bandwidths and or concurrent processing of multiple narrow band signals is critical to leveraging the spatial multiplexing advantages of a phased array. Additionally, high frequency phased arrays in transmit mode, require accurate beam steering to maximize signal power through constructive addition (in space) as generating power at these frequencies is difficult and expensive. Accurate beam steering over wide instantaneous bandwidth is also required to support high data rates (∼10Gbps) in WPAN communication systems.

The design of a frequency synthesizer with good phase noise and decent tuning is critical to proper functioning of high frequency phased arrays. The poor quality factor of on-chip passives at higher frequencies coupled with lowered conversion gain at these frequencies necessitate the use of frequency multipliers and / or injection locking in order to generate low noise LO signals at high frequencies. Frequency multiplication is achieved through the use of non-linearity in high gain tuned amplifiers. Injection locked

---

Even when an oscillator is available, it is power hungry and has extremely limited tuning (decided by the parasitic capacitance from the transistors).
oscillators are similar to PLLs and can generate low noise frequency multiples of a low noise reference signal. At high frequencies, injection locking rather than a full blown PLL is preferred owing to availability of wider filtering bandwidths and ease of design. A clean reference signal is generated from a low frequency PLL which is then multiplied, through an injection locked frequency multiplier, to generate a high frequency LO signal.

1.4 Organization

This dissertation proposes potential solutions for few of the issues discussed above. It presents two wideband phased array design schemes that help tackle wide instantaneous bandwidths (or concurrent multiple narrowband signals) at the antenna. The first scheme leverages spectral diversity in addition to spatial diversity while the second scheme processing analog signals in discrete time using signal arrival time as a reference. A simple and universal injection locked oscillator noise analysis is presented that makes it possible to model phase noise in injection locked systems through easily computable design parameters similar to a type-I PLL. Lastly, a frequency doubler architecture is presented that reduces the power consumption and area while providing differential outputs.

Organization: This thesis is organized into 5 main chapters with an introduction upfront and a conclusion at the end.

Chapter 2 provides a general discussion about the working and advantages of phased arrays along with a brief overview of the different kinds of phased arrays.

Chapter 3 discusses a frequency domain solution for resolving wide instantaneous bandwidths with good beam steering accuracy.

Chapter 4 discusses a discrete time domain solution for wideband phased arrays (timed arrays).

Chapter 5 discusses a simple universal phase noise model for injection locked oscillators.

Chapter 6 introduces a differential current reuse frequency doubler.
Chapter 2

Antenna Arrays

2.1 Introduction

An antenna is an electrical device that converts electrical energy into EM waves (radio waves) and vice versa thus allowing disconnected transmitter and receiver, to communicate wirelessly [6]. An ideal antenna is an isotropic radiator (receiver). The radiation pattern shown in Fig. 2.1a falls of as $1/r^2$ in the far field (the near field pattern is mathematically involved and of little interest in communication over distances beyond a wavelength). The receiving pattern of the antenna is identical to the radiation pattern owing to the principle of reciprocity [6]. A directive antenna radiates (receives) unequally in different directions as shown in 2.1b. The antenna directivity is a function of the antenna geometry, the frequency of the signal, presence or absence of a ground plane and relative orientation of the radiator w.r.t. the ground plane among others. An antenna array is a collection of antennas in a linear or planar arrangement. If multiple antennas operate simultaneously, an energy pattern of maxima and minima is created because of the constructive and destructive of signals from different antennas.

---

[6] The antenna, being a passive element, does not have any gain. However, a directive gain is often defined for antennas and is equal to the ratio of the power radiated (received) in a given direction by the directive antenna and the power radiated (received) in the same direction by an isotropic antenna assuming the total power for both the antennas to be the same.
Figure 2.1: Radiation (reception) patterns for isotropic/anisotropic antennas and antenna array

at different points in space (or on a board / chip in the case of a receiver). As shown in Fig. 2.1c this radiation (energy) pattern has a main lobe and side lobes. The number of lobes and their location is a function of the number of antennas and the physical spacing between them. In an N-element antenna array, if we represent the complex signal (it carries both amplitude and phase information) from the $i_{th}$ antenna, at the point of observation, as $a_i e^{j\theta_i}$, the resultant signal, when all the antennas operate simultaneously, can be obtained by super-positioning the signals from individual antennas and is given by $\sum_{i=1}^{N} a_i e^{j\theta_i}$. In order to gain insight into the formation of the radiation (reception) pattern we consider the simplified case shown in Fig. 2.2 for an N-element receiver array

In Fig. 2.2 a progressive phase shift of $\beta$ has been added to each progressive antenna. The implementation of this phase shift will be discussed later. In Fig. 2.2 we have assumed an antenna spacing of $d$ and an angle of arrival (AOA) of $\theta$ w.r.t. broad-side. Some definitions are in order.

**Broadside:** The direction in which radiations from all the antennas add in phase.

---

Since the principle of reciprocity applies to all the antennas and the resultant receiver reception pattern is a superposition of individual antenna patterns we derive the pattern for a receiver with the knowledge that an identical pattern exists for the transmitter.
Equivalently it is the direction from which an incident signal has identical phase at all the antennas. It is the line (axis) of symmetry in the array geometry if all the antennas are identical. Note that broadside is always defined with $\beta = 0$.

**AOA**: Angle of arrival is the angle of incidence of the incoming signal (effective angle of transmission for an outgoing signal) w.r.t the array broadside direction.

We assume that the transmitter and the receiver are sufficiently far apart so that the signal arrives at different antennas along parallel lines. This in turn implies that the signal covers an additional distance of $d \sin(\theta)$ to arrive at each progressive antenna.

Assuming free space propagation, this extra distance translates to a progressive time delay of $\tau = \frac{d \sin(\theta)}{c}$. Further, for a signal frequency of $f$, this time delay translates to a progressive phase shift of $\phi = \frac{2\pi d \sin(\theta)}{\lambda}$, where $c = f\lambda$.

In order to arrive at the energy pattern for the receiver array we make another assumption. We assume that the signal strength at each antenna is the same (equivalently, signals at all antennas have the same strength and all antennas are identical). Let $A$ be the signal strength at each antenna. The resultant signal at the output of the summer in Fig. 2.2 is given by:

$$A \sum_{i=0}^{N-1} e^{j \times i \times (\phi - \beta)} = Ae^{j(N-1)(\frac{\phi-\beta}{2})} \frac{\sin \left( N \left( \frac{\phi-\beta}{2} \right) \right)}{\sin \left( \frac{\phi-\beta}{2} \right)}$$

(2.1)
The magnitude multiplier in (2.1), \( \frac{\sin(N(\frac{\phi-\beta}{2}))}{\sin(\frac{\phi-\beta}{2})} \), is called the array factor (AF) or the array gain. It has a maximum value of \( N \) at \( \beta = \frac{2\pi d \sin(\theta)}{\lambda} \). At \( \beta = 0 \) the maxima is aligned along the broadside where as for non-zero values of \( \beta \), the signal’s maxima (the main lobe in the radiation pattern) is steered off broadside. Therefore by tuning \( \beta \) we can achieve beam steering or scanning. Since \( \phi \) is itself a function of \( d \) and \( \theta \), the array factor (assuming identical antennas and equal signal strength) is a function of \( d \), \( \theta \), \( \beta \) and \( N \).

Fig. 2.3a plots the array factor (for \( \beta=0 \)) on a rectangular plot for a 4 antenna array for three different antenna spacings. Fig. 2.3b plots the same pattern on a polar plot. The plots have a distinct, desired, main lobe and unwanted sidelobes. These sidelobes, also referred to as grating lobes, are a function of \( N \) and \( d \). For smaller values of \( d \) we get fewer grating lobes. However, as we shall see shortly, signal combining in space (transmitter) or on chip / board (receiver) aside from lending beam steering and beam forming capabilities, also leads to an SNR improvement. This improvement results from the correlation between the signals at different antennas while the noise at these antennas is uncorrelated. Additionally, proximity between antennas affects their radiation pattern due to coupling [7]. Therefore in order to ensure minimal antenna coupling and minimum correlation between noise at different antennas, at least half a wavelength of separation between the antennas is recommended.

For a half-wavelength spacing between the antennas, the radiation pattern of an \( N \)-element array has \( N-1 \) lobes in total, 1 main and \( N-2 \) sidelobes. Fig. 2.4 plots the normalized array factor for \( d = \frac{\lambda}{2} \) for different array sizes. It should be mentioned here that the ratio of the maxima to the minima of the array factor on a log scale is known as null-depth and is one of the primary metrics used to judge the performance of a phased array (larger null depths are better).

A 2-D array has two degrees of freedom, one along each reference direction of the array. 

\[\text{In applications where antenna is also used as a signal conditioning element, antenna spacing can be used to control the array factor}\]
Fig. 2.5 shows a 4x4 array of patch antennas. There are two different progressive time delays, $\tau_x$ and $\tau_y$, incurred as one moves along the x and y directions respectively. The progressive time delays lead to progressive phase shifts along the reference directions. The net phase shift for the $(k, l)_{th}$ element in the 2-D array is given by $\phi_{k,l} = 2\pi f (k\tau_x + l\tau_y)$, where $\tau_x$ and $\tau_y$ are the progressive delays incurred along the x and y directions and $k$ and $l$ are the row and column number of the array element respectively. For $\alpha$ and $\beta$ as shown in Fig. 2.5, we have $\tau_x = \frac{d_x \sin(\alpha) \sin(\beta)}{c}$ and $\tau_y = \frac{d_x \cos(\alpha) \sin(\beta)}{c}$ where $c$ is the speed of light in free space. The main lobe of the beam is defined by $(\alpha, \beta)$.

An N-element array with correlated signal and uncorrelated noise leads to an overall SNR improvement of $10\log(N)$. If the noise at the antenna is correlated the actual
SNR improvement may be smaller or greater depending on the sign of the correlation. Additionally, even with uncorrelated noise at the antenna, the overall SNR improvement depends on the position in the signal processing chain where the signals are combined.

Let us consider an N-element array where the signal strength and the noise power at each antenna are $S$ and $W$ respectively. Let the output of each antenna be processed by an amplifier with gain $k$ and output noise power $V$. Therefore the net signal power at the output of the combiner is $N^2k^2S^2$ while the net noise power is $Nk^2W + NV$ leading to an output SNR of $\frac{S^2}{N + \frac{V}{k^2W}}$. When compared to the input SNR, the output SNR improves by $10\log\left(\frac{N}{1+\frac{V}{k^2W}}\right)$. The noise figure (NF) of a system is defined as the ratio of the output and the input SNR defined on a log scale, i.e. $NF = 10\log(\frac{SNR_{out}}{SNR_{in}})$. Since every amplifier adds some finite noise, the minimum value of NF is 0dB. The NF of each signal path in the above example is

$$NF_{path} = 10\log\left(\frac{S^2(k^2W + V)}{Wk^2S^2}\right) = 10\log\left(1 + \frac{V}{k^2W}\right)$$

Therefore we can express the SNR improvement due to combining as

$$10\log(N) - NF_{path}$$

Since the minimum value of NF is 0dB, the best case SNR improvement (under the
assumption of zero correlation between noise at antennas) is $10\log(N)$. Antenna arrays, aside from enabling spatial selectivity, also aide in realizing multi-user communication. Previously we had (implicitly) assumed the presence of multiple copies of the same signal at different antennas. However, wireless systems are broadcast networks. Signals from different sources including multiple time separated copies of the same signal coexist at any given time in space. Through use of multiple antennas at TX and / or at RX simultaneous multi user communication without any interference can be achieved. This is perhaps the most important and sought after feature of multiple antenna systems. Multiple input multiple output communication, or MIMO [9] as it is commonly known, uses multiple antennas at TX and RX along with intelligent signal processing at the digital back end to decompose a mixture of information from all sources into parallel streams of information from individual sources. In order to do so, MIMO systems perform singular value decomposition of the channel gain matrix $H$ shown in Fig. 2.6. In order to perform SVD on the channel gain matrix, transmit precoding and receiver shaping is performed at the transmitter and receiver respectively [10]. This parallel decomposition results in a $k$-fold improvement in the channel capacity, where $k = \min(N, M)$, leading to a $k$ times higher throughput for the same signal bandwidth (also referred to as multiplexing gain)! A phased array is a trivial MIMO implementation where the complicated signal processing is replaced with a simple phase alignment and addition operation. The signals are combined in the ratio of their SNR’s at the antennas, also known as maximum ratio combining (MRC). In the preceding phased array derivations, the signals were assumed to have the same SNR and hence were combined in the same ratio. This simple signal combining leads to a $\log(N)$ improvement in the channel capacity as opposed to a linear $N$-fold improvement attainable in a full-fledged MIMO implementations.

Here we have considered the noise from different amplifiers to be uncorrelated. For a general treatment, the interested reader is encouraged to go through [8].
2.2 Antenna Arrays Applications

Antenna arrays were developed during world war II for radar beam scanning applications [11]. Since a radar must detect enemy targets from all directions and do so very frequently, a beam steering control was required. Mechanically moving antenna arrays was both cumbersome and expensive. This led to the development of phase shift based solutions. These phase shifts were electronically tunable and hence antenna arrays based upon them came to be known as electronically scanned arrays (ESA). Post war, significant research effort was directed towards antenna arrays. The initial thrust of this research was towards military objectives but later the research was expanded to commercial applications and feasibility of antenna arrays.

Antenna arrays have evolved over the years with reduction in form factors and increased functionality per unit area, keeping in line with the growth and progress in the semiconductor industry. Fig. 2.7 lists some of the common uses of antenna arrays. In THz application where generating power is difficult and expensive, multiple antennas serve to increase the signal power through constructive addition in space. They are used at security check points for image scanning in the 77GHz and the 94GHz band. At these
frequencies EM radiations can pass through clothes and other objects but do not penetrate the human tissue. The 77GHz band is also used for automotive / vehicular radar applications for blind spot visibility, parking by sensing proximity etc. Automotive radars also operate in the 22GHz-29GHz band.

Another important commercial application of antenna arrays is in the 60GHz wireless personal area network or WPAN. There are two important reasons for selecting 60GHz as the frequency of operation for WPAN. Firstly, there is a huge chunk of unassigned spectrum (7GHz, from 57-64GHz) available around this frequency. Secondly, the peak in the absorption spectrum of O\textsubscript{2} at 60GHz leads to a significant increase in frequency re-usability, over shorter distances, in and around this frequency. On the flip side, the large absorption also leads to significant signal attenuation leading to minimum, if any, signal recovery from multipath components. This necessitates enhancement in

Figure 2.7: Different application spaces for phased arrays
line of sight communication through antenna arrays which offer precise directivity in
the transmit and receive pattern. There are, of course, all of the other advantages of
antenna arrays too.

Other common applications of antenna arrays include air traffic control, weather and cli-
mate monitoring, high resolution satellite imaging, fault detection using non-destructive
testing, wireless routers (MIMO application), medical imaging, etc.. Antenna arrays also
enable, through intelligent signal processing, multi-functional\footnote{In a military radar context, a multi-functional radar refers to scanning, detecting and ranging over a widely ranging resolutions and distances through a single radar\cite{13}} radars which can carry
out simultaneous tasks\cite{14}, such as weather and air traffic surveillance in a single radar
system or target scanning and friendly communication together in a military radar.

### 2.3 Phased Arrays and Timed Arrays

Beamforming and beamsteering antenna arrays, as mentioned previously, rely on a
tunable phase shift, $\beta$ in Fig.\ref{fig:phaselayout}. This phase shift compensates for difference in the time
of arrival of the signal at different antennas in a receiver. Equivalently in a transmitter
they serve to change the broadside direction (where signals add constructively in phase).

An antenna array is referred to as a phased array if tunable phase shifts are used
to compensate for the different signal phases at the antenna. If the phase shift is
compensated for in time domain using tunable time delays, the antenna array is referred
to as a timed array. When dealing with a narrowband signal\footnote{signals with small fraction BW $\frac{\text{signal BW}}{\text{carrier BW}}$} both time delay and
phase shift are identical. However as the signal bandwidth increases, a phased array only
approximately aligns the signals while a timed array still aligns the signals perfectly.

Since the phase shifts (and time delays) can be tuned electronically, phased arrays and
timed arrays are often referred to as Electronically Scanned Arrays (ESA).
2.3.1 Phased Array: Architectures and Prior Art

Fig. 2.8 shows different architectural choices for a phased array implementation. RF phased arrays use phase shift in the RF path; the path between the LNA (and subsequent RF amplifiers) and the antenna. LO phased arrays use phase shift in the LO path while IF/baseband phased arrays use phase shift in the baseband path. Note that the phase-shifters in Fig. 2.8 are just locational representatives. The actual circuit level realization of the phase-shift element will be very different for the three cases, each having its own design constraints. It is obvious from Fig. 2.8 that early signal combination (summation) leads to lowering of component count with all the antennas sharing the components that follow the combiner (summer). An increased component count further leads to increased power consumption, increased area overhead and routing complexity. Also as was discussed previously, the effective SNR improvement degrades with every extra signal processing block before the signals are finally combined. Early signal combining also relaxes the linearity requirement of the later blocks by eliminating any interferer or jammer (by tuning the null of the phased array to align with the interferer or jammer). Also any mismatch error at component level is reduced through early combining by reducing the number of independent blocks. The phased array shown in Fig. 2.8 is a receiver array (RX array). A transmitter phased array (TX array) too has equivalent
IF/LO/RF phased array versions. But in a TX array in-phase signals are phase shifted prior to sending them to the antennas thus creating an equivalent beam in space that can be steered either side of broadside where as in a RX array phase shifts are introduced after the antenna to cancel the phase difference between the signals at the antennas.

We now look at the three categories of phased arrays in greater detail.

![RF phase shift choices](image)

Figure 2.9: RF phase shift choices (a) variable lumped LC T-line (b) variable MEMS T-line (c) a 4x4 butler matrix (d) an RTPS

2.3.1.1 RF Phased Arrays: Prior Art

RF phase shifters must handle (possibly) large interferers / jammers while providing a good NF and linearity. The phase shifter must also provide a broad phase shift range along with a good phase accuracy over the desired signal bandwidth. Traditionally these phase shifters have been implemented as delay line elements with mechanically or electronically tunable delays. In doing so they act more like timed arrays than phased arrays since an explicit time delay is being introduced in the signal path. There

`\[\text{in a TX array the signal combination takes place in space}\]`
are different kind of RF phase shifters such as ferrite (slow response), MEMS based (intermediate response) and MMIC (monolithic microwave integrated circuits) phase shifters (fast response) etc. These passive phase shifters formed the backbone of earlier radar and communication phased array solutions where they were integrated with other off-the-shelf TX/RX modules (such as LNAs, PAs, etc.). But a single-chip integrated solution is not possible with these phase shifters owing to their size and speed.

In order for passive phase shift solutions, mentioned previously, to be combined with other on-chip active circuits to realize an integrated phased array solution, RF phase shifters push the operation frequency towards GHz to reduce the size of the passives. They either use tunable delay lines, Fig. 2.9(a,b) or passive structures such as butler (or blass) matrix, Fig. 2.9c or reflection type phase shifters (RTPS), Fig. 2.9d. Tunable delay lines use lumped approximation of a transmission line using variable inductors and capacitors. The variable inductor and capacitor vary the line’s group delay while maintaining impedance match (two degrees of freedom). The phase shift is quite accurate over a large frequency range. However the implementation is bulky and lossy owing to the poor quality of on-chip passives. Additionally analog control is possible only for capacitors thus limiting the phase-shift (or delay) range. It should be mentioned here that tunable delay lines are not strictly phase shifters. A phase shifter is supposed to provide a fixed phase shift regardless of the input frequency where as a time delay provides a frequency proportional phase shift. Therefore in order to constructively add a wideband signal, we actually need a time delay. However if signal cancellation (nulling) is desired a phase shifter is required. A butler matrix, Fig. 2.9c, has N inputs and N outputs. It uses a 90° hybrid and fixed phase shifts (usually

---

8 An on-chip integrated solution refers to a single CMOS or SiGe or GaAs etc integrated circuit (chip) that has all TX/RX functionalities built in (it may or may not include antennas depending on area and frequency of operation).
9 signals can be canceled using time delays as well however they lack the sharp spatial selectivity of phased arrays
10 N must be a power of 2
Figure 2.10: Generalized representation of a vector interpolator and a quadrature all pass filter (QAF) multiples of $\frac{\pi}{N}$ recursively to synthesize N simultaneous output beams. However the beam directions are fixed relative to each other. Other disadvantages include loss due to passives, requirement of fixed phase shifts (multiples of $\frac{\pi}{N}$) across a wide frequency range and associated area overhead. Blass matrix \cite{23} is similar to butler matrix but does not require the inputs and outputs to be the same (or a power of 2).

Another very popular RF phase shifter, especially at high operating frequencies, is the reflection type phase shifter (RTPS) \cite{24}. The reflection coefficient of a transmission line is defined as the ratio of the reflected wave to the incident wave and is given by $\Gamma = \frac{Z_L - Z_0}{Z_L + Z_0}$ where $Z_0$ is the line’s characteristic impedance (real for a lossless line) and $Z_L$ is the load impedance. If $Z_L$ is purely imaginary and $Z_0$ is real, then the argument of $\Gamma$ is given by $\angle \Gamma = 180 - 2\tan^{-1}(\frac{Z_L}{Z_0})$. By tuning $Z_L$ we can tune the phase of the reflected wave. However since the incident and reflected waves coexist on the same line, a directional coupler is needed to separate the two as shown in Fig. 2.9d \cite{2}. Akin to the tunable delay line, here also analog control is available only for capacitive tuning which limits the load tunability. It also suffers from loss owing to the passives and an area overhead.
All the previously discussed phase shifting (time delay) schemes are entirely based on passives. Among active phase shifting schemes, vector phase interpolation is perhaps the most common and the most elegant. It requires the availability of quadrature signals. Let us consider a quadrature input signal of the form of $S = I + jQ$. In order to phase shift this signal by $\theta$ we must multiply it by the exponential $e^{j\theta}$, i.e. $S' = Se^{j\theta}$. Here $S'$ is the output of the phase shifter. We can equivalently represent $S'$ in terms of the input signal’s in-phase and quadrature components as:

$$S' = (I + jQ)(\cos(\theta) + j\sin(\theta))$$

$$= (I\cos(\theta) - Q\sin(\theta)) + j(I\sin(\theta) + Q\cos(\theta)) = I' + jQ'$$

Therefore in order to rotate the signal $S$ by theta we need to multiply its in-phase and quadrature components by $\cos(\theta)$ and $\sin(\theta)$ and sum the resultant with appropriate polarity. This is the working principle of a vector interpolator. Note that the summing weights need not realize the exact $\sin(\theta)$ and $\cos(\theta)$ value. They just need to be in the appropriate ratio of $\tan(\theta)$. As we will see shortly vector interpolation is the most popular scheme to generate phase shifted LO signals in an LO Phased array since quadrature signals are readily available through QVCOs or ring VCOs. But at RF, the incoming signal is real and quadrature generation would require use of poly-phase filters. However polyphase filters are lossy\footnote{Any loss in the signal path adds directly to the NF (in dB)\cite{25}}, unbalanced and frequency selective. In order to circumvent the issue of amplitude imbalance and frequency selectivity, an all pass quadrature filter at RF was proposed in \cite{21}. The filter realizes an all pass transfer function of

$$\begin{bmatrix} V_{OI}\pm \\ V_{OQ}\pm \end{bmatrix} = V_{in} \times \begin{bmatrix} \pm \frac{s^2 + \frac{2\omega_0}{Q} s - \omega_0^2}{s^2 + \frac{2\omega_0}{Q} s + \omega_0^2} \\ \pm \frac{s^2 - \frac{2\omega_0}{Q} s - \omega_0^2}{s^2 + \frac{2\omega_0}{Q} s + \omega_0^2} \end{bmatrix}$$

Fig. 2.10a shows the constituent blocks of a vector interpolator. The QAF discussed above is shown in Fig. 2.10b. A vector interpolator is a true phase shifter since it phase shifters...
shifts all frequencies by the same amount. Unlike passive phase shifters, active phase shifters have a limited linearity. Since any interferer elimination takes place after the active phase shifters, they must be highly linear which consequently affects their gain. In [21] putting a high gain LNA before the phase shifters is proposed. This has two principal drawbacks. First, RF-phased array loses its component count advantage and all other advantages associated with the same if each antenna is followed by its own LNA. Second, now the phase shifter must handle large signals making its linearity even more critical. Additionally even though the amplitude imbalance is eliminated accurate quadrature is available at only one frequency. By working with sub-unity $Q$ values, less than $\pm 5^\circ$ phase error is incurred over almost $\pm 50\%$ fractional bandwidth. However $Q$ lowering adds further to the noise of the circuit. Another issue that plagues the design is the non-flat filter response owing to unequal real terms in the numerator and denominator of the transfer function which leads to in band signal amplitude distortion. In [26], a $\frac{\lambda}{4}$ coupled T-line is used to generate RF quadrature signals which are then processed by VGAs and summers for cartesian combining. Since the length of the T-line is $\frac{\lambda}{4}$ at only one frequency (94GHz in the design), obvious quadrature errors creep in when operating over high bandwidths. The design has an RMS phase error of less than $7.5^\circ$ in a frequency range of 85-110GHz while the intended range of operation is 90-100 GHz.

2.3.1.2 LO Phased Arrays: Prior Art

Unlike RF and IF based phased arrays, LO phased arrays control signal path phase shift through gating or enabling signals. They do not lie in the signal path directly and hence do not suffer from strict NF or linearity issues that plague RF or IF phased arrays making them a very attractive phase shifting proposition. However, since the signal combining occurs after the LNA (and any other RF amplifier), the RF path now

---

In a receiver chain, noise performance of earlier amplifiers dominate the NF while linearity of later amplifiers dominates the overall linearity.
must have a good NF and high linearity to leverage full SNR and beamforming benefits. Additionally, LO distribution increases area and power overhead while inter-channel mismatch and leakages in the RF and LO path lead to additional routing complexity. LO phase shifters can be classified into three categories:

(a) Current steering mixer

(b) Switched $G_m$ units

Figure 2.11: Architectural choices for vector interpolator

(a) Vector interpolator based: requires one set of quadrature signals and as many cartesian combiners as there are antennas

(b) Ring VCO based: requires at least as many ring stages as there are antennas

---

13 Since the PLL can not drive the mixers directly, additional buffers are required. Moreover in order to ensure phase accuracy area intensive symmetric multichannel routing is required.

14 In high frequency applications LC based ring stages are preferred over their CMOS counterparts owing to their spectral purity and lower power consumption (although the passives occupy all the VCO area)
(c) **Coupling based:** requires as many VCOs (or QVCOs if accurate quadrature signals are required) as there are antennas.

The principle of operation of a cartesian combiner was explained earlier. Fig. 2.11a and Fig. 2.11b show two candidate architectures for quadrature coefficient weighing. In Fig. 2.11a a mixer with two DC current signals with values \((\sin(\theta))\) and \((\cos(\theta))\) are commutated at the LO frequency. Therefore this architecture is simply an upconversion mixer with weighted DC bias currents. In [27] the authors have actually integrated the baseband phase interpolator and the subsequent mixer in a transmitter array. For the receiver portion of the chip however they have implemented a baseband phase interpolator separately using the architecture described in Fig. 2.11b. It is possible to integrate the baseband interpolator and downconversion mixer by controlling the \(g_m\) of the tail transistor in the mixer. However unlike in an upconversion mixer, tail current does not control the weighing coefficients directly, making the process nonlinear. In [28] explicit VGAs have been used at RF to generate the weighing coefficients for the subsequent phase shifting using the quadrature LOs. Since it is similar in action to integrating the interpolator and the mixer, the design is more of an LO phase shift based than RF phase shift based. Here it should be mentioned that integration of the LO interpolator and the mixer leads to similar linearity requirements being imposed on the resultant phase shifter as would be imposed on a RF or an IF phase shifter.

In Fig. 2.11b instead of explicitly generating sine and cosine weighted currents, quanta of currents are added together [29, 30] through switchable and polarity reversible (not shown) \(g_m\) cells. Phase shifts corresponding to integer tangent ratios can be realized using this architecture. While in a mixer based design achievable phase resolution depends on the resolution of the tail current steering DAC, here the phase resolution depends on the number of \(g_m\) units. One of the major drawbacks of cartesian combining at LO frequencies is difficulty in achieving gain at these frequencies. In order to combat

---

[25] polyphase filters followed by hard limiters can generate approximate quadrature over process variation.
the low pass filter at the output node, either the pole is pushed out by reducing load impedance leading to greater power consumption, or a tuned load can be used leading to an additional area overhead.

Unlike cartesian combiner based multi-phase LO generation, a ring-VCO based solution Fig. 2.12 does not run into the difficulty of low LO amplitudes although the LO distribution network does consume a lot of power and headroom since all possible LO phases need to be routed to every channel. More importantly, another drawback is the lack of phase tuning (only fixed phase shifts available) Fig. 2.12a. This can be offset by having a small tuning range RF (or baseband) phase shifter before (or after) the mixer. This phase shifter needs to have a tuning range of 0 to $\frac{2\pi}{N}$ where $N$ is the number of ring oscillator stages. Another elegant solution was proposed in [31] where an extra tunable phase shift was introduced in the ring oscillator loop Fig. 2.12b. If the phase shift of this block be $\phi$, the progressive phase shift in the ring oscillator will be $\frac{2k\pi - \phi}{N}$, where $k \in Z$. If an analog control on $\phi$ is available then continuous phase tuning is possible from $\frac{2(k-1)\pi}{N}$ to $\sim \frac{2k\pi}{N}$. In this design the authors have selected $k = 0$ for a 4 stage variable phase ring oscillator. The external phase shift has been implemented through a chain of 4 differential LC amplifiers with varactor tuning on the center frequency. In order to realize a full 360° swing, each of these 4 stages will have to produce a phase swing.

The buffers used here get enough signal swing for rail to rail swing at the output of a CMOS buffer or the complete IxR swing in a CML stage.
shift of 90° which is not possible owing to low impedance values at such phase shifts. The authors were able to get a ±60° phase shift from each stage. This leads to a spatial beam-steering of ±20°. By phase inverting every alternate ring VCO output node, the authors can realize both a θ and a θ − 180° phase shift, there by further extending the beam-steering to −90° to −40° and 40° to 90°. In [32], the authors have implemented the VPRO at half the desired frequency. The actual LO is fed to the receiver through a doubler which doubles the total achievable phase shift there by allowing a full 180° beam-steering. The authors have also used their VPRO design for phase modulation and demodulation in the TX and RX, respectively, and hence have been able to do away with a mixer. Although fine phase tuning is available, symmetric LO distribution is still area and power hungry. Additionally, extending the design to greater number of antennas requires larger number of ring stages which runs into the issue of multiple possible modes of oscillation. Ring VCOs are a special case of coupled oscillators

\[ \begin{align*}
\sin(\omega_{inj}t + \beta) & \rightarrow \sin(\omega_{inj}t + 2\beta) & \cdots & \sin(\omega_{inj}t + N\beta) \\
\omega_{0,1} & \rightarrow \omega_{0,2} & \cdots & \omega_{0,N} \\
\omega_{inj} & \rightarrow \omega_0 & \cdots & \omega_0
\end{align*} \]

(a) Meander line  (b) Cascaded

Figure 2.13: Chain of injection locked oscillators as phase shifters

where the last oscillator is coupled to the first one. In general we can have an open chain of coupled oscillators where the \( i_{th} \) oscillator drives the \( (i + 1)_{th} \) oscillator and is in turn

\[ \text{Each mode has a different progressive phase shift.} \]

driven by the \((i - 1)_{th}\) oscillator \([33]\). Such an open loop series arrangement can be used to generate a progressive phase shift along the chain. Note that owing to the open loop operation, there is no constraint that forces this progressive phase shift to be a function of \(\pi\) and \(N\) where \(N\) is the number of oscillators in the chain. Two elegant designs that use a chain of coupled oscillators, or injection locked oscillators, are reported in \([34]\) to generate continuous progressive phase shift patterns. In design I, Fig. 2.13a, the same external input locks on to oscillators with different center frequencies. The phase shift in an ILO \([5]\) is a function of the difference between the injection frequency and the center frequency of the free running oscillator. By tuning the center frequency of each oscillator independently, both progressive as well random phase shift patterns can be generated. In order to achieve full spatial scanning, the output nodes can be swapped in a differential ILO implementation. Although the same issue of symmetric LO routing and area overhead remain, another issue is the non-linear phase-frequency relationship in a ILO. The issue, though, can easily be resolved through the use of a look-up table, needing just a one time calibration. In design II, Fig. 2.13b, the input locks on to the first oscillator in the chain with each oscillator locking onto the next one in a series pattern. This resolves the issue of controlling each oscillator independently since now all the oscillators are identical with the same center frequency control. Here too swapping the output nodes leads to full spatial scanning. A lookup table is required in this design too.

2.3.1.3 IF Phased Arrays: Prior Art

The phase shift or time delay principles used in IF phased arrays are very similar to those used in RF phased arrays. Phase shifts are achieved either through cartesian combining or through time delaying. Cartesian combining IF phased arrays either have explicit baseband vector interpolators \([35]\) or an interpolator-mixer combine \([27]\). The linearity constraints on the interpolator are much harsher though.

The cartesian combining schemes discussed thus far all work in continuous time and use
switching (discrete or continuous) current units in one form or the other. If implemented in discrete time domain, three different ways of varying the sine and cosine coefficients emerge, Fig. 2.14:

(a) **Varying the integrating current (current mode sampling)**: same as above
(b) **Varying the integration time period (current mode sampling)**: keeping integration current and sampling capacitor size the same for all the channels, vary the integration window width in sine / cosine ratios
(c) **Varying the sampling capacitor size (voltage sampling)**: vary the size of the sampling capacitor for each channel in a sine / cosine ratio.

Note that quadrature currents (voltages) are required for all of the above combining schemes. In [36] scheme (c) has been implemented through the following sine and cosine approximation.

\[
\sin(\phi) = \frac{\alpha}{\alpha + \frac{3}{4}} \\
\cos(\phi) = \frac{1 - \alpha}{(1 - \alpha) + \frac{3}{4}}
\]

Four 25% non-overlapping clocks are used for differential I and Q sampling. For each clock phase, while that phase serves as the sampling phase, the other clock phases serve as charge sharing and summing phases respectively. A 3-bit binary weighted capacitor bank \((\frac{C}{2}, \frac{C}{4}, \frac{C}{8})\) along with two constant \(\frac{C}{16}\) capacitors realize \(\alpha C\) and \((1-\alpha)C\)
respectively. The scheme achieves a phase resolution of 11.25° with an RMS phase error of 2°. While better phase resolution can be attained by using a larger capacitor bank, generating high frequency non-overlapping clocks is an issue which restricts the circuit to operations below to RF frequencies below 4GHz. If an active mixer were to be used instead and the sampling based vector interpolation be shifted to baseband, the frequency of operation can be extended. Other issues that affect the design include noise folding due to voltage sampling as well as out of band interferers aliasing in band.

Time delays at RF are realized using lumped-LC approximations of transmission line. As was mentioned earlier in order to integrate the same with on-chip actives, the frequency of operation is pushed to higher GHz. Therefore any such approximation using passives to realize time delays is ruled out at baseband frequencies. An active-RC approximation is used instead. The approximation can be arrived at by considering time delay in frequency domain. The time domain output of an ideal time delay block for an input \( x(t) \) is \( x(t - \tau) \) where \( \tau \) is the amount by which the signal is delayed. Equivalently we can express the input and output in frequency domain as

\[
\mathcal{L}(x(t)) = X(s)
\]

\[
\mathcal{L}(x(t - \tau)) = X(s)e^{-s\tau} = H(s)X(s)
\]

Here \( H(s) = e^{-s\tau} \) is the laplace transform of the impulse response of the delay block. Therefore in order to realize a time delay of \( \tau \) we need to synthesize a circuit whose \( H(s) \) equals \( e^{-s\tau} \). Any circuit can only approximately realize \( e^{-s\tau} \) since an exact realization would require infinite blocks. In [37] one of the earliest integrated on-chip approximations of \( H(s) \) was realized for a 10Gb/s wireline signal link. The authors realized \( H(s) \)

\[\text{[37]} \]

Since a divide by 2 is needed for sampling clock generation, the input to divide by 2 can serve as the LO signal for the mixer.
This expression can be realized using simple 1st order RC circuits. The authors realized

\[
H(s) \sim \frac{e^{-\tau s/2}}{e^{\tau s/2}} = \frac{1 - s\tau/2}{1 + s\tau/2}
\]  

(2.2)

the same using a common-emitter differential pair with emitter degeneration. In [3] the
same \(H(s)\) has been realized using cascaded \(gmC\) stages. Since (2.2) is only a first order
approximation, it is valid for small values of \(\omega\tau\). Therefore a single stage implementation
can either provide a large delay for a small frequency range or can provide a small delay
for a large frequency range. In [3] have overcome this issue by realizing a small delay,
\(\tau/N\), from one stage and cascading \(N\) such stages to realize an overall delay of \(\tau\). The
individual delay stage is as shown in Fig. 2.15. The per stage delay is controlled through
a variable capacitance while the overall delay in each path is selected by activating one
of 5 serial tap points (one after each delay stage) through a V-I converter. The design is
intended for low frequency radar applications from 1-2.5 GHz and generates a maximum
delay of 550ps with an error of \(\pm 10ps\) for a 4 element array. If the frequency of operation
and / or the number of elements is increased, the number of delay stages will have to
increase proportionally for the same maximum delay. Another disadvantage is the noise
and linearity issues associated with the \(gm\) stage. Since \(H(s)\) has a magnitude of unity,
the active circuitry adds noise without boosting the signal. If the gain of the front end
LNA is increased to tackle the NF issue, the \(gm\) stages run into stricter linearity issues.
Also as the signal swing at the gate of the \(gm\) stage increases, so does its gate capacitor
variation which makes the delay a function of signal level, albeit a small function. If the
fixed capacitor is large enough, this does not affect the coarse tuning but it still impacts the fine tuning delay resolution nonetheless. Another issue that affects the design is the intra-element and inter-element matching between the delay stages, which gets worse as the device size and \( R \) and \( C \) value comes down at higher frequencies. In [38], a second order approximation of \( H(s) \) was realized using an \( LCR \) circuit. However, the second order approximation is lossy, area hungry and has scalability and tunability issues.

2.3.2 Timed Array: Architectures and Prior Art

Timed arrays function by introducing explicit time delay in the signal path. When processing narrowband signals, the difference between a timed array and a phased array is hardly noticeable owing to the small phase error incurred over the signal bandwidth. However as the signal’s fractional BW increases, so does the phase error in a phased array. This leads to beam squinting, an effect in which different frequencies align in (or appear to align in) different directions. The RF phase shifters based on delay line approximation discussed earlier are, strictly speaking, time delay units and hence some of the RF phased arrays described earlier are actually timed arrays.

Silicon based timed array circuits function by implementing time delays, essentially, in two different forms: (a) through active RC delay cascading (discussed above) [3] and (b) through passive LC tunable approximation of T-lines [39–42]. A lossless transmission line has a characteristic impedance of \( Z_0 = \sqrt{\frac{L}{C}} \) and a cut-off frequency of \( \omega_0 = \frac{1}{\sqrt{LC}} \).

The phase constant \( \beta \) of an ideal lossless T-line is given by \( \beta = \omega \sqrt{LC} \), leading to a characteristic delay of \( \tau = \frac{\beta}{\omega} = \sqrt{LC} \). Note that the unit of \( \tau \) is sec/m. LC approximation based timed arrays vary this characteristic delay by varying \( C \) and or \( L \). Additionally, by tapping the signal at multiple points across the line, the delay can be discretized into unit stage delays. This provides a coarse tuning while changing the \( C \) and or \( L \) values in each unit provides fine tuning. Note that both \( L \) and \( C \) need to be tuned in order to maintain impedance match. However, only discrete \( L \) tuning is available in passives making it difficult to tune the delay and match the impedance at the same
time. An acceptable impedance mismatch level lies between $\frac{Z_0}{2}$ to $2Z_0$ (corresponding to an $S_{11} \leq -10dB$) thus providing some flexibility in tuning the delay. Only one tap point per delay line is active at any time. Amplifiers are required for each tap point to compensate for the losses in the LC line, much like repeater units in telephone cables. Once the delay element has been synthesized, different delay configurations such as iterative delay steps (like the butterfly diagram in an N point FFT, requires $log_2 N$ different delays), delay and sum (serially delay and combine signals, uses N-1 units of a single delay rather than implementing N-1 different delays) and regular brute force delaying and summing (requires N-1 different delays, all multiples of a unit delay ranging from 1 to N-1). All the different delay configurations have been implemented in [39–42].

2.4 Summary

This chapter provides a brief introduction to antenna arrays, beam formation, beam steering and the origin of SNR improvements in the same. It briefly discusses salient features of, and differences between, a timed array and a phased array receiver. Further, different implementations of phased array and timed array receivers are described and a few specific prior arts are discussed. The generic architecture of different types of phased and timed arrays with their advantages and disadvantages is also discussed.
Chapter 3

Wideband Phased Array: Frequency Domain Implementation

3.1 Introduction

Spatial diversity to achieve signal directivity and filtering has been exploited using phased-arrays in a variety of applications. They have traditionally been used in the military to send and receive information from fixed directions and filter out enemy blockers [43]. Recently, they have also begun to find commercial usage in HDMI, automotive and vehicular radars, satellite communication and imaging applications [18,44–46]. It is possible to use additional diversity (in other domains) to improve communication robustness and performance. For example, radar systems exploit spatial diversity for precise targeting, and frequency diversity to reduce target fluctuations [47]. In [18] diversity in space and time has been utilized to send and receive codes occupying the same frequency bandwidth. With each degree of diversity an additional degree of freedom is gained to manipulate and immunize the signals of interest [48,49]. In this work, we
present a spatio-spectral beamformer that exploits diversity in frequency and space, in that order.

The chapter discusses possible architectural implementations for a general spatio-spectral beamformer. The composite filtering nature of the architectures is discussed. Possible implementations of IF spatio-spectral beamforming front-ends using an analog FFT (capable of multiple and simultaneous steering directions per frequency bin) are presented. Further, the ideas and concepts presented in [50] (wherein a new spatio-spectral receiver architecture was implemented) are elaborated.

Spatio-spectral beamforming architectures offer the following advantages:

a) *Wideband phased arrays:* Allows for the accurate beam steering of signals with a large fractional bandwidth

b) *Multi-carrier phased arrays:* Enables multi-carrier communications and/or multi-target illumination where beams can be independently and concurrently steered

c) *Enhanced blocker suppression:* Can suppress blockers both in the spatial and in the spectral domains

d) *Lower ADC power:* Reduces ADC dynamic range requirements, and consequently ADC power, by virtue of analog channelization and spatio-spectral blocker nulling

These advantages are further elaborated below.

### 3.1.1 Wideband phased arrays

Fig. 3.1(a) shows a conceptual spatio-spectral beamformer where a wideband signal is split into smaller frequency slices or channels using an FFT and each slice is steered in the same direction. This frequency splitting significantly reduces beam squinting for wideband phased arrays. A space-only phased-array filters the signal only in the spatial domain either through time-delay circuits or phase shifters. All EM waves travel at the same speed in a medium and hence suffer the same progressive time delays in arriving
Figure 3.1: Different beamforming scenarios for a spatio-spectral architecture (Each color denotes a different frequency slice and can be steered independent of the others. Only the main lobes are shown.)
at different points in an antenna array. Since the phase shift is directly proportional
to the frequency ($2\pi f\tau$ for a given time delay $\tau$), phase-shifter based phased-arrays are
inherently narrowband since they apply a constant phase-shift to the entire spectrum.
In order to deal with signals having large fractional bandwidths ($f_{frac} = 2\frac{f_{max}-f_{min}}{f_{max}+f_{min}}$),
an on-chip constant group-delay circuit is required. Such circuits have recently been
implemented using true-time-delay circuits based on LC approximation of a transmission
line or using all-pass delay circuits [51–54]. It is possible to accomplish the same using
phase-shifter based delay circuits as well. The key to using phase-shifters to accurately
steer large $f_{frac}$ signals is to slice them in frequency and have a dedicated phase-shifter
per frequency slice.
For example, in this work, an analog FFT [55] based frequency discriminator is employed
to slice the signals into $N$ equal parts and each frequency slice is processed by a dedicated
phase shifter, $N$ being the number of FFT points.

Figure 3.2: Phase error vs. frequency for a phased-array system with a fractional
bandwidth of 0.5 and 16 frequency slices

The accuracy of beam steering is proportional to the number of frequency slices. It is
so because to each such frequency slice a phase shift corresponding to the center of that
slice is applied and thus the maximum phase error is $2\pi \Delta f_{slice} \tau$ where $\Delta f_{slice}$ is half
of the slice width. Fig. 3.2 plots phase error vs. frequency for a system with a carrier frequency of 6GHz and a fractional bandwidth of 0.5. The linear slope in blue represents the phase error when a fixed phase shift (60°) corresponding to the center frequency is applied across the entire spectrum. A maximum phase error of 15° occurs at the extreme ends (4.5GHz and 7.5GHz). The line in red corresponds to a true time delay circuit. Since all frequencies are affected by equal time delays, there is no phase error when using a true time delay. The band of green lines correspond to the phase error when the signal spectrum is split into 16 equal parts (slices). A phase shift corresponding to the center frequency of the slice is applied across the entire slice. The maximum phase error, post spectral slicing, comes down to \(\sim 0.95°\). The phase error goes down by 16 times, as expected.

### 3.1.2 Multi-carrier phased arrays

This scheme allows multiple frequencies to be simultaneously used for phased array communications. This enables multi-carrier communications where each carrier can be independently beam-steered. Fig. 3.1(b) considers such a multi-carrier scenario where each carrier (and its associated modulation) is considered as a different frequency slice. Each slice is then steered (concurrently) in a different direction independent of the other slices.

### 3.1.3 Enhanced blocker suppression

Under this scheme each frequency slice is accompanied by a dedicated phase shifter, and therefore each slice can be independently and simultaneously steered in any direction in a multi-carrier scenario. This enables blocker suppression by steering a particular frequency beam away from another beam at that frequency. Alternatively, blockers at a different frequency but incident from the direction of the signal can be rejected using this scheme improving the robustness of this scheme. Blocker rejection for the prototype architecture discussed in this chapter is further discussed in Section 3.3.
3.1.4  Lower ADC power

As discussed above, the blockers are suppressed and signal channelized prior to digitization. This relaxes the constraints on ADC significantly. This is so because eliminating the blocker reduces the ADC dynamic range, and frequency channelization (due to the analog FFT) reduces the peak to average power ratio (PAPR) across each slice leading to lower dynamic range requirements of ADC [55, 56].

The rest of the chapter is organized as follows. Section II discusses and compares different architectures that can be used for spatio-spectral beamforming emphasizing an analog-FFT based scheme as a natural and convincing fit. Section III discusses the FFT based composite filtering of the architecture and presents an analog-FFT based phase-shifter to realize concurrent multi-directional beam steering. Section IV discusses the 2-channel 4-frequency prototype implemented. Section V focuses on testing and measurements. Section VI concludes the chapter, followed by acknowledgements.

3.2  Spatio-Spectral Beamformer

As was seen in the previous section, a wideband beamforming setup, in the absence of a true-time-delay element, necessitates slicing of the signal in the frequency domain. In space-only phased-arrays, phase-shifting can be done at the RF [46, 57], the LO [58], or the IF [59]. In a spatio-spectral scheme a dedicated phase-shifter per frequency slice is required. Frequency slicing or filtering can be done before or after the phase shifting. Here the possible spatio-spectral architectures are evaluated based on where the beamforming is done, assuming that the frequency slicing is done after-wards, although the order of operations can be reversed.

a. RF beamforming

b. LO beamforming

c. IF beamforming
Schemes (a) through (c) are summarized in Fig. 3.3. In each of the schemes it is assumed that there are $M$ elements (channels) and $N$ frequency slices.

![Figure 3.3: Architectural choices for spatio-spectral beamforming (Each color stands for a different frequency)](image)

### 3.2.1 RF beamforming

Under this scheme, each element’s LNA’s output is processed in parallel by a set of RF phase-shifters, as shown in Fig. 3.3(a). The phase shifter outputs are then combined to realize beamforming at RF. This scheme requires $M$ LNAs, 1 LO, $N$ mixers, and $N$ IF (baseband) processing elements. It has the advantage of eliminating the blocker early in the signal chain thus easing the linearity requirements of the subsequent blocks.\[^1\]

If $M \geq N$, then this scheme requires the fewest components. The drawback of this scheme is that the RF phase shifters are either lossy and area-hungry when passive affecting the noise figure significantly, or have frequency dependent group delays for active implementations \[^60,62\]. This scheme is favorable when the number of elements is much larger than the number of frequency bins ($M \geq N$) and size of the frequency slices are small (narrowband systems). In other words, RF beamforming is most suitable for typical phased arrays where spatial diversity is emphasized as compared to spectral diversity.

\[^1\] The blocker is eliminated (for the frequency slice being addressed by the phase-shifter) prior to down-conversion which lowers the linearity requirement but the mixer still processes residual blockers from the other frequency slices, since frequency discrimination is done post downconversion. Recall, using fixed phase shifts is accurate only over a narrow bandwidth.
### 3.2.2 LO beamforming

Fig. 3.3(b) shows an LO based spatio-spectral scheme. There are $N$ LOs per element with each LO tuned to the center of a frequency slice. This scheme has a total of $M$ LNA’s, $M \times N$ mixers, $N$ LOs and $N$ IF (baseband) processing elements. It has the advantage of moving the phase shifters out of the signal path and hence they do not affect the system performance. Also the LO based phase shifters can have much finer phase tuning compared to their RF counterparts [58,63]. Since each LO is tuned to the center of a slice the same low-pass filter implementation can be used to process the mixers outputs. Since the blocker is eliminated after downconverting, the mixer has higher linearity requirements. The biggest drawback of this scheme is its routing complexity and higher power consumption arising out of LO path duplication and buffering for each element. Additionally synthesizing concurrent multiple frequencies on chip runs into frequency coupling and spurious tone issues, especially if the desired LOs have narrow separation in frequency. Reference [64] reports a spectrum slicing scheme that alleviates the above mentioned issues but it traverses the spectrum sequentially and thus is unsuitable for our purposes (concurrent beam synthesis). LO beamforming is suitable when $N$ is small, the number of elements is much larger than the number of frequency bins ($M \geq N$), and the frequency slices are sufficiently far apart. Again, like RF beamforming, LO beamforming is most suitable for typical phased arrays where spatial diversity is emphasized as compared to spectral diversity.

### 3.2.3 IF(baseband) beamforming

Fig. 3.3(c) shows the IF beamforming scenario. This scheme requires $M$ LNAs, 1 LO, $M$ mixers and $N$ IF (baseband) processing elements. Although both filtering and beamforming can be done entirely in digital, in this scheme we focus on doing both prior to digitization. As was stated earlier this relaxes the dynamic range requirements of the ADCs. Once again, the mixer processes the signal as well as the blocker and
Table 3.1: Architectural choice for spatio-spectral beamformer

<table>
<thead>
<tr>
<th></th>
<th>LNA #</th>
<th>LO #</th>
<th>Mixer #</th>
<th>B.B(^{a1})</th>
<th>L(^{a2})</th>
<th>C(^{a3})</th>
</tr>
</thead>
<tbody>
<tr>
<td>RF</td>
<td>(M)</td>
<td>1</td>
<td>(N)</td>
<td>(N)</td>
<td>low</td>
<td>(M \geq N)</td>
</tr>
<tr>
<td>LO</td>
<td>(M)</td>
<td>(N)</td>
<td>(M \times N)</td>
<td>(N)</td>
<td>high</td>
<td>(M \geq N^{b})</td>
</tr>
<tr>
<td>IF</td>
<td>(M)</td>
<td>1</td>
<td>(M)</td>
<td>(N)</td>
<td>high</td>
<td>(N \geq M)</td>
</tr>
</tbody>
</table>

\(^{a1}\)Baseband
\(^{a2}\)Linearity requirements
\(^{a3}\)Choose if \(N\) should be small with frequency slices wide apart

hence has higher linearity requirements. Phase-shifting at IF reduces sensitivity to circuit parasitics (as compared to doing so at high RF/LO frequencies) and improves inter-element matching. The fine phase-shift resolution advantage of LO filtering is also retained. Apart from the more stringent linearity constraints on the front-end, another issue that normally plagues this scheme is the realization of band-pass filters at IF. While on the one hand passive realizations necessitate use of inductors\(^2\) which impose significant area constraints, on the other hand active implementations are noisier and power-hungry. Using large passives also limits the number and the size of frequency slices realizable making scalability an issue. If \(N \geq M\), then this scheme is more favorable since it requires the fewest RF components while giving better phase resolution and linearity (phase) compared to RF phase shifting.

Table 3.1 summarizes the above discussion based on the number of elements, the number of frequency slices, and the linearity requirements. For instance, a UWB scheme, where frequency diversity is critical toward performance, an IF beamforming approach can be used to address the 3-10GHz band using a 4-antenna front-end with 16-frequency slices (440MHz/slice) to perform phased array beamforming. On the other hand, a short-range automotive radar scheme, where spatial diversity and beam directivity is emphasized, an RF beamforming approach would be preferred to address the 22-26GHz

\(^2\) Multi-stage RC ladder filters followed by CR ladder filters can also realize bandpass filtering at baseband. However they have poor quality factor.
band using a 16-antenna front-end with 4-frequency slices (1GHz/slice).

In general, for large fractional bandwidths $N \geq M$. Under such a scenario if the power consumption in the baseband filtering and phase-shifting operations can be minimized then scheme (c) is the preferred choice for a spatio-spectral architecture. In this work a discrete-time IF filtering implementation is selected, where the signal is first spectrally filtered and then spatially filtered\(^3\). An analog FFT engine, Charge Reuse Analog Fourier Transform (CRAFT) \(^{55}\), has been used as the frequency discriminator. The output bins (frequency slices) of the FFT are then processed by a vector-combiner. The entire baseband is a passive switched-capacitor realization resulting in excellent linearity and inter-element matching. The passive analog baseband computations consume negligible power \(^{55}\). The recursive nature of the implementation lends itself well to scaling. Also the size of a frequency slice is directly proportional to the sampling rate of the FFT engine, $f_s$. This makes the implementation tunable across a wide range. There is one FFT engine per element and one vector-combiner (phase-shifter) per FFT frequency bin(slice).

![An 8 Point FFT's Frequency Response for a Sampling Rate of 2GS/s](image)

Figure 3.4: An 8 point FFT’s output bin frequency response for a 2GS/s sampling rate (Only 3 bins shown)

\(^3\) Note that it is possible to reverse these operations without affecting the linearity of blocks other than these two.
Figure 3.5: A spatio-spectral beamformer (2-channel) with an FFT (4-point) as the spectral filter (Each color stands for a different frequency slice and corresponding phase shifter.)

3.3 FFT Filtering and Phase-Shifting

3.3.1 FFT filtering

An $N$ point FFT takes in $N$ sampled time domain inputs and outputs $N$ discrete frequency domain values. Each of the $N$ FFT outputs has a sinc like magnitude representation in the frequency domain. Each FFT output (frequency) bin is centered at $k f_s^N$, where $f_s$ is the sampling rate $k$ is the bin number. In order to analyze the frequency response of this system the input to the system is assumed to be a pure sinusoid, at
frequency \( f \), sampled at rate \( f_s \). Then it follows that

\[
X_k = \sum_{i=0}^{N-1} x[i] e^{-j\frac{2\pi i k}{N}}
\]

\[
x[i] = e^{j\frac{2\pi i f}{f_s}}
\]

\[
X_k(f) = e^{-j(N-1)\pi(k - \frac{f}{f_s})} \left( \frac{\sin(N\pi(k - \frac{f}{f_s}))}{\sin(\pi(k - \frac{f}{f_s}))} \right)
\]

\[
|X_k(f)| = \frac{\sin(N\pi(k - \frac{f}{f_s}))}{\sin(\pi(k - \frac{f}{f_s}))}
\]

Eq. (B.1) shows that each FFT output bin behaves like a sinc filter. Since the FFT is preceded with an anti-aliasing filter, analyzing Eq. (B.1) in \([-\frac{f_s}{2}, \frac{f_s}{2}]\) suffices. Fig. 3.4 shows the output of an 8 point FFT sampled at 2GS/s. Each output of an \( N \) point FFT is a complex filter centered at \( k\frac{f_s}{N} \), \( k \in [-\frac{N-1}{2}, \frac{N}{2}] \). The \( Q \) of the filter, where \( Q = \frac{f_{\text{center}}}{f_{3\text{dB}}} \), is given by \( k/0.89 \).

### 3.3.2 Composite filtering

A spatio-spectral beamformer filters the signal in the frequency and spatial domains. Fig. 3.5 shows a 2-channel 4-beam (4-point FFT) architecture that first spectrally filters broadband signals and then spatially processes them (hence the composite filtering action). In the absence of a blocker, the signals (on each output bin of the FFT) can be steered in any desired direction.

---

Figure 3.6: Two limiting cases for spectral filtering
Figure 3.7: SNR and SINR for different blocker directions and frequencies.

Even in the presence of blockers, a spatio-spectral beamforming scheme allows us to leverage a second degree of freedom by means of spectral filtering on top of spatial filtering (1st degree of freedom is in space). To better appreciate the advantages of spectral filtering let us consider the two extreme cases shown in Fig. 3.6. As suggested by Eq. B.1 any signal, except for signals at multiples of $\frac{f_s}{N}$, is present on multiple output bins. Since spectral filtering takes place prior to beamforming, any blocker that is not on-bin gets attenuated before being spatially filtered. Case (I) in Fig. 3.6 considers a blocker that falls near the null of an FFT (spectral) sinc and thus gets severely attenuated. This allows us to steer the signal in the broadside direction in the subsequent spatial filtering hence maximizing the signal amplitude. On the other hand in case (II) the blocker happens to fall in the vicinity of an on-bin signal and hence does not suffer any significant attenuation. As a result, the signal is steered off the broadside to eliminate the blocker spatially. The deviation of the signal strength from the maximum achievable depends on the difference between the signal’s and the blocker’s angle of arrival.

In general any signal would be partially spectrally filtered. Whether the blocker is

---

Footnote: If beamforming is done in digital, several different combining options can be exercised by adding a variable amplitude gain with variable phase shifts per element. In analog combining, it is assumed that only the phase-shifts per element can be manipulated.
completely eliminated or the signal is steered to broadside would depend on the extent of this spectral filtering. On the one hand, post signal combining, only the signal and noise remain (SNR metric) while on the other hand we have a stronger signal corrupted by a residual blocker and noise (SINR metric). Depending on the frequency of the blocker there are two possible options, i.e., whether to eliminate the blocker completely or to steer the signal to broadside.

Since we ultimately go with the option that gives us better SNR or SINR, let us compare these two. Let $\theta_{\text{sig}}$ and $\theta_{\text{blk}}$ be the progressive phase shifts of the signal and the blocker, at the antenna array, respectively. The signal is assumed to be on-bin. Also, let $f_b$ be the blocker frequency and $A$ be the blocker amplitude normalized to the signal. Then it follows that

\begin{align}
\alpha^2 &= \frac{\sin^2 (M(\theta_{\text{sig}} - \theta_{\text{blk}}))}{\sin^2 (\theta_{\text{sig}} - \theta_{\text{blk}} + \frac{2\pi}{M})} \quad (3.2a) \\
\gamma^2 &= A^2 |X_k(f_b)|^2 \frac{\sin^2 (M(\theta_{\text{blk}} - \theta_{\text{sig}}))}{\sin^2 (\theta_{\text{sig}} - \theta_{\text{blk}})} \quad (3.2b) \\
\text{SNR} &= \frac{\alpha^2}{\sigma^2} \quad (3.2c) \\
\text{SINR} &= \frac{M^2}{\sigma^2 + \gamma^2} \quad (3.2d)
\end{align}

Here $\alpha$ is the effective signal strength at the output of the signal combiner when the phase-shifter is tuned to eliminate the blocker while $\gamma$ is the effective strength of the blocker at the output of the signal combiner when the phase-shifter is tuned to steer the signal to broadside (full-strength). Eqn. (B.2) suggests, that if $\gamma$ is small enough, then the SINR would be higher than the SNR. Of course, this will not be true for all the bins. The bins that have a significant presence of the blocker will have to resort to blocker cancellation in the spatial domain, i.e., the SNR option. But since space-only phased-arrays always exercise the SNR option, a composite filtering approach, space

---

*The signal is stronger since constructive summing of the signal maximizes the signal amplitude, but this does not cancel the blocker and hence a residual blocker remains. While, if the blocker is to be canceled, the signal amplitude is less than the maximum possible due to partial phase-offset between signals from different channels.*
and frequency, gives us either the same SNR or a better SINR. This observation is further elaborated through Fig. 3.7. It shows the output on BIN 2 of a 16-point FFT sampled at 2GS/s. The signal is assumed to be on-bin (250MHz) while the blocker frequency is swept across the sampling frequency range (2GHz). A noise level of $\sigma^2 = 0.01$ and a blocker level of $A = 2$ is assumed. A 4-element (channel) front-end is assumed. As is evident from the figure, for blockers in the vicinity of the signal complete blocker elimination is preferable as it gives a higher SNR while for blockers farther away from the signal, steering the beam to broadside gives a higher SINR. It should be noted that this scheme also allows us to achieve a ($\sim$)full array gain (SINR~32dB for 4 elements) even in the presence of a blocker (though it should be away from the signal). For the cases considered in Fig. 3.7 the best case improvement is 17dB while in the worst case it is (if spatial blocker elimination is chosen) 0dB.

In the extreme case when there is a well spread broad-band blocker, it has to be eliminated using spatial filtering. However, even in this case, spectral filtering provides an N-fold improvement in beam-steering (as well as blocker nulling) accuracy owing to the architecture’s large $f_{frac}$ handling capability.

### 3.3.3 FFT as a phase-shifter

The switched capacitor setup used to realize the analog FFT can also be used to realize a phase shifter. In order to do so time synchronous (parallel) samples are taken from the different elements and treat them as time asynchronous (sequential) samples from a single element. In fact any FFT engine with access to synchronously sampled inputs from different elements can act like a phase shifter. To understand this let us once again consider a system with $M$ elements (antennas) and an $N$ point FFT with $N \geq M$.

---

Space-only phased arrays can also steer the beam to broadside if the blocker is at a different frequency from the signal and then filter the residual blocker in digital but the ADC has to process the signal as well as the residual blocker thus increasing ADC dynamic range requirements.
Figure 3.8: A 2 channel 4-point FFT based spectral filter followed by a dedicated 4-point FFT based phase shifter implementation

Expanding the expression for the output of the $k^{th}$ bin of an $N$ point FFT, $X_k$ it follows

$$X_k = x[0] + x[1]e^{jk\theta} + \ldots + x[N-1]e^{jk(N-1)\theta}$$  \hspace{1cm} (3.3)$$

where $\theta = \frac{2\pi}{N}$. Denoting the input sample from the $i^{th}$ element, at time $nT_s$, by $x[i]_{nT_s}$ and replacing the $x[i]$’s, $i \in [0, M-1]$, in (B.3) with $x[i]_{nT_s}$ and setting $x[i] = 0$ for $i \in [M, N-1]$, it follows

$$X_k[nT_s] = \sum_{i=0}^{M-1} x[i]_{nT_s} e^{ji\theta}$$  \hspace{1cm} (3.4)$$

Looking at (B.5) one can immediately recognize it as the output of a beamformer in discrete time. Concurrent samples are taken from the $M$ channels and a progressive phase shift of $\theta$ is applied to each sample. Note that the summation index $i$ in (3.4) runs from 0 to $M-1$ not $N-1$, while the index $k$ runs from 0 to $N-1$ corresponding to progressive phase shifts of 0 to $(N-1)\theta$, with a phase shift resolution of $\theta$.

Fig. 3.8 shows a 2-channel phased array using a 4-point FFT as a spectral filter and a phase-shifter. Each one of the 4 bins of the FFT (spectral filter) from channel I and II act as an input to the 4-point FFT employed as a dedicated phase-shifter for that bin. Fig. 3.9 shows the circuit simulation results for beamforming for one of the
Figure 3.9: Beamforming results (circuit simulations) for one bin for a 2-channel phased array with a 4-point FFT as the spectral and the spatial filter bins for the 2-channel implementation of Fig. 3.8. Four simultaneous beams in four different directions are available at the output, one corresponding to each output bin of the 4-point FFT.

In general for an $M$ channel implementation with an $N$-point FFT as the spectral filter, $N$ additional $L$-point FFTs are required for dedicated phase-shifting where $L \geq M$. The phase shift resolution of the phase-shifter is $2\pi L$ with a minimum inter-element phase-shift of 0 and a maximum of $2\frac{(L-1)\pi}{L}$. It would provide $L$ simultaneous beams at the output. For instance a 4-channel implementation with an 8-point FFT as the spectral filter and 8-point FFT as the phase shifter would require a total twelve 8-point FFTs. The corresponding phase resolution of the scheme would be $45^\circ$ with 8 simultaneous beams at the output.

A big advantage of an FFT based phase shifter is the simultaneous availability of multiple beams in different directions per frequency bin. An $N$ point FFT as a spectral filter followed by an $N$ point FFT as a phase shifter gives a total of $N^2$ beams, $N$ per output bin of the spectral FFT. A single transceiver block can thus resolve multiple carrier signals in multiple directions using this scheme. The overhead in the architecture is the size of the analog FFT engine which is a passive block and can be designed for extremely low power, as will be seen in the next section. This is a big advantage over
Figure 3.10: Signal processing chain (Both spectral and spatial filtering are sinc in nature.)

power hungry LO/RF based beamformers.

The signal processing chain is summarized in Fig. 3.10. As is evident from the exponential summations, eq. (B.1) and eq. (B.5), both spectral and spatial filtering are sinc in nature allowing us to use the same low power charge based computational engine for both. There are $M$ sinc (spectral) filters and $N$ sinc (spatial) filters for an $M$-channel $N$-point FFT based spatio-spectral beamformer.

The architecture can be used to address wideband signals with multiple carriers (different modulation on each carrier) or a single wideband signal with the same modulation throughout the bandwidth of interest. In the former scheme there is some computation error due to leakage of the adjacent channels into the channel of interest owing to the $1/f$ roll-off of the FFT sinc filter. The error can be minimized by having a finite guard-bandwidth between the channels with each channel centered on an FFT bin frequency. The trade-off is the lowering of the occupied bandwidth. In the latter case however the error due to signal leakage is minimal and arises due to addition of attenuated and
differently phase shifted versions of a frequency slice to the dominant (on-bin) phase-shifted version of that slice. The different phase-shifts arise due to each FFT bin being differently phase shifted so that the entire wideband signal is steered in the same spatial direction.

![Diagram](image)

Figure 3.11: 2-channel 4-frequency beamforming receiver

### 3.4 A Two-Channel Four-Frequency Prototype

As a proof of concept, a 2-channel 4-frequency 8GHz receiver was designed and tested in the IBM 65nm CMOS. A block level design of the architecture is shown in Fig. 3.11. Two single ended RF inputs, centered around 8GHz, are provided through a GSGSG probe. The single-ended signals are converted to differential signals using on-chip baluns as shown in Fig. 3.11. Each balun drives a quadrature mixer. Buffered I and Q outputs from the mixer are then fed to the CRAFT engine, one for each channel. Discrete time outputs from the CRAFT engine are then fed to a vector-combiner to complete the
beam synthesis. The RF and baseband sections of the design are discussed next and testing is discussed in Sec. 3.5.

3.4.1 LO Generation

An ILO acts as a first order PLL with a wide loop-filter bandwidth (equal to the lock-range), allowing simple generation of a clean LO from the injection signal \[65\]. Hence, the I and Q channel LO signals are generated using a quadrature ILO (QILO), as shown in Fig. 3.12(a). The QILO performs frequency multiplication, in addition to quadrature enhancement \[66,67\]. A single-ended injection signal with a frequency of \(\frac{f_{\text{LO}}}{3}\) is provided through a GSG probe. The injection signal is then converted to a differential signal through an on-chip balun, Fig. 3.11. The differential outputs of the balun are fed to a poly-phase-filter (PPF), Fig. 3.12(b), which converts it to I-Q differential signals. These quadrature signals are then hard limited and pulse-slimmed \[68\], Fig. 3.12(c), to reduce the content of the fundamental and enhance the 3\(^{rd}\) harmonic of the injection signal, \[69\]. The VCO has a 3-bit capacitor bank that tunes the VCO frequency from 7.5GHz to 9.2GHz. The QVCO draws a current of 5.6mA from a 1.2V supply. The QVCO outputs are buffered using CML buffers that draw a combined current of 10mA from a 1.2V supply.

3.4.2 Down-conversion Mixer

Fig. 3.13 shows the down-conversion mixer used in the design. It is a double balanced mixer with \(g_m\) enhanced Gilbert cells. Each mixer (two per channel, I and Q) draws a current of 3.6mA from a 1.2V supply. The gain of the mixers in extracted simulations is 8dB.

3.4.3 Analog FFT

A passive switched capacitor analog domain FFT based on the CRAFT engine \[55\] is designed. It is based on the recursive Cooley-Tukey radix-2 FFT algorithm \[70\] for
DFT computation. Signal values are sampled onto capacitors as inputs. The charges stored on the capacitors can then be summed, subtracted, scaled, and rotated through a sequence of charge sharing operations. The inputs to an FFT are generally complex. For this purpose separate I and Q data streams are maintained to represent complex values. The summation operation, $v_1 + v_2$, is achieved by sharing the charges on the capacitors in the same order of polarity. Subtraction is identical to summation except, the polarity of one of the charges is reversed. For scaling the charge on a capacitor is shared with a capacitor with no charge. Rotation or complex multiplication is a two step operation. For instance to accomplish $(x_i + jx_q)(\alpha + j\beta)$, $\alpha x_i, \alpha x_q, \beta x_i$, and $\beta x_q$ are first computed and then summed to get $\alpha x_i - \beta x_q$, and $\alpha x_q + \beta x_i$.

Note that using this scheme, only sub-unity scaling can be performed, which is suitable for the FFT operation where the scaling coefficients are $e^{j\theta}$ whose real and imaginary

Figure 3.12: A quadrature injection locked oscillator along with a poly-phase filter and a pulse slimming circuit
A 4 point FFT using 2 radix-2 stages, or butterflies, has been used in this work. The 4 bin outputs, in terms of the inputs can be written as:

\[
S_{DC} = (s[0] + s[1] + s[2] + s[3])
\]

\[
S_{f_s/4} = (s[0] - s[2]) - j (s[1] - s[3])
\]

\[
S_{f_s/2} = (s[0] + s[1] + s[2] - s[3])
\]

\[
S_{-f_s/4} = (s[0] - s[2]) + j (s[1] - s[3])
\]

A total of 8 clock phases are generated, using a state machine from an external clock input, for sampling and processing operations. Since the computations in the baseband are passive in nature, the power consumed in the baseband is attributable to the generation and routing (buffering) of these clocks along with the dynamic power consumption of the switches. The first 4 phases are used for sampling while the next two phases are used for processing. The next two phases are used for vector combining and output.

\[\text{Note that these computation techniques can also be used for multiplicands greater than unity by incorporating a constant scaling factor in computations. The accumulated scaling factors of cascaded stages can be later removed using an active amplifier stage.}\]
latching. The CRAFT engine's unit and radix operations are summarized in Fig. 3.14.

The noise figure of the analog FFT based passive baseband is given by

$$NF = 1 + \frac{kT/C}{\alpha W_{in}^2}$$

where $\alpha$ is the attenuation of the passive analog FFT, $kT/C$ is the noise of the capacitor onto which the input is sampled and $W_{in}^2$ is the total integrated R.M.S noise present on the input to the sample before the FFT. The attenuation $\alpha$ arises from the scaling (attenuation via charge-sharing and charge-stealing) of the I and Q sample values prior to sharing to realize the complex multiplication. For a 4-point FFT $\alpha = 1$, since $e^{-j\pi}$ and $e^{-j\frac{\pi}{2}}$ can be performed by simply reversing the polarity and swapping I and Q wires.

### 3.4.4 Vector-Combiner

Three discrete phase shifts, $\theta \in \{-90, 0, 90\}$, are realized through clockwise and anti-clockwise rotations identical to the ones used for $\pm j$ multiplications in the FFT core. A switch is used to select between the three phase shifts. The phase shifts are only applied to the signals from the II\textsuperscript{nd} channel (element). Each FFT bin is followed by a vector-combiner, 4 in total.

The phase resolution of the scheme is $90^\circ$, similar to a 4-point FFT. But unlike a 4-point FFT only one beam per frequency bin is available at a time. Recall that an N-point FFT as a phase shifter allows for N simultaneous beams in different directions per frequency slice of the spectral filter (an analog FFT in this implementation).

Finally beam synthesis is completed by summing phase rotated outputs from channel II with the outputs from channel I. Once again summation is accomplished as in Fig. 3.14(a). The baseband setup including the vector-combiner and phase selection switches is summarized in Fig. 3.15. The entire baseband is realized through switched capacitor circuits. The output analog latch is a switched capacitor amplifier. The input capacitor of the latch can be simultaneously charged by one, two, three or all the four FFT bins. In the complete chip including the digital baseband, the op-amps will be replaced by ADCs. In this chip, the op-amps are used only to facilitate testing, and
Figure 3.14: A 4 point FFT and charge sharing operations
their power is not included in computing the IF power. Thus the architecture allows for multi-frequency concurrent beam steering.

Figure 3.15: Baseband: Analog FFT + Vector-Combiner

Figure 3.16: Chip micrograph

3.5 Measurement results

The 2-channel 4-frequency prototype (Fig. 6.4) occupies an area of 1.25mm×0.75mm (including the test pads). The RF section consumes 27.4mW/channel (including buffers) while the baseband section consumes 135μW (9pJ/conv.).
3.5.1 Test-setup

The test setup is shown in Fig. 3.17. IF test signals were generated using an arbitrary waveform generator (AWG). The AWG’s output was kept at ~2GHz and then upconverted using an external LO at 6GHz to get an RF input of ~8GHz. The IF signals were generated at a high frequency (~2GHz) instead of baseband to avoid the image signal issue. Directly upconverting a baseband signal, say at frequency \( f \), would have created an additional tone at \(-f\). Thus ensuring a single tone at the sampler’s input wouldn’t be possible under direct upconversion. The IF frequency (~2GHz) was so chosen so as to ensure that the image frequency (~4GHz) falls well outside the mixer’s bandwidth.

Three different types of test signals are generated: one tone, two tones, and three tones. The multi-tonal signals emulate multi-directional signals. In each case the tone is put slightly off bin. Having an off-bin signal serves two purposes. It shows the I-Q rotation and it helps cancel any DC offset that might creep into the output magnitude calculations. Whenever a signal, at say frequency \( f \), is sampled at a rate, say \( f_s \), different from an integer multiple of the signal frequency, the sampled output is a sinusoidal signal rather than a DC value. Its frequency also called the beat frequency is equal to \( \frac{f}{n_0} \) where \( n_0 = \min \{ i : i \frac{f_s}{f} \in Z \} \). Fig. 3.18 shows the output on Bin 2 when the beat frequency is 1KHz. Ideally the I and Q peak to peak values should be the same for any off-bin signal but due to mismatches between the output latches and off-chip buffers there is a minor difference in the two values.
3.5.2 Measurements

The analog baseband sampling rate is set to 120MS/s and multi-carrier test signals are generated at 30MHz intervals using the AWG at 1.989GHz, 2.019GHz and 2.049GHz, i.e., a beat frequency of 1MHz. Output signal amplitude is calculated based on I and Q signal values, \( A = \sqrt{I^2 + Q^2} \). This removes any offsets from the measurement. Measured average peak-to-peak values of the I and Q outputs (nominally equal) are used in the amplitude expression. In the presence of multiple tones at the input, any non-linearity in the front-end and distortion in the FFT engine causes unequal null depths on different frequency bins.

Beamforming tests are performed for four different input patterns: (I) input signal on one bin (one tone), (II) input signals on two adjacent bins (two tones), (III) input signals on two alternate bins (two tones) and (IV) input signals on three bins (three tones, all but DC). In each case channel I is treated as the reference while the RF input to channel II is phase shifted in 24 uniform phase steps of 15° w.r.t. channel I. On-chip discrete phase shifts of 0° and ±90° are available to channel II while the phase of channel I is
Table 3.2: A summary of comparison with other related works.

<table>
<thead>
<tr>
<th>Technology (nm)</th>
<th>63</th>
<th>71</th>
<th>58</th>
<th>18</th>
<th>52</th>
<th>59</th>
<th>72</th>
<th>60</th>
<th>73</th>
<th>54</th>
<th>This Work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Tx / Rx</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td># of channels</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>6</td>
<td>4</td>
<td>16</td>
<td>16</td>
<td>4</td>
<td>4</td>
<td>2</td>
</tr>
<tr>
<td>φ-shift @ RF / IF / LO</td>
<td>LO</td>
<td>LO</td>
<td>LO</td>
<td>RF</td>
<td>RF</td>
<td>IF</td>
<td>RF</td>
<td>RF</td>
<td>IF</td>
<td>IF</td>
<td>IF</td>
</tr>
<tr>
<td>Frequency (GHz)</td>
<td>77c</td>
<td>6–18</td>
<td>2.4c</td>
<td>24–26</td>
<td>30–40</td>
<td>60c</td>
<td>60c</td>
<td>60c</td>
<td>1–4</td>
<td>1–2.5</td>
<td>8c</td>
</tr>
<tr>
<td>W / Nd</td>
<td>N</td>
<td>N</td>
<td>N</td>
<td>N</td>
<td>W</td>
<td>N</td>
<td>N</td>
<td>N</td>
<td>W</td>
<td>W</td>
<td></td>
</tr>
<tr>
<td># of beams</td>
<td>1</td>
<td>4</td>
<td>1</td>
<td>4</td>
<td>7</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>4</td>
</tr>
<tr>
<td>Beams/frequency</td>
<td>1</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>7</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>Spatial-Resolution</td>
<td></td>
<td>0.35h</td>
<td></td>
<td>30°</td>
<td>18°</td>
<td>—</td>
<td>2°</td>
<td>3.6°</td>
<td>3.6°</td>
<td>14ps</td>
<td>30°</td>
</tr>
<tr>
<td>Null-depth (dB)</td>
<td>&gt;12h</td>
<td>&gt;21.5</td>
<td>&gt;25</td>
<td>&gt;23</td>
<td>—</td>
<td>&gt;25i</td>
<td>&gt;29i</td>
<td>&gt;25i</td>
<td>&gt;20i</td>
<td>&gt;20i</td>
<td>&gt;19</td>
</tr>
<tr>
<td>Vdd (V)</td>
<td>2.5</td>
<td>1.6</td>
<td>1.55</td>
<td>1.2</td>
<td>2.5</td>
<td>1.1</td>
<td>2.6</td>
<td>2.7</td>
<td>1.2</td>
<td>1.8</td>
<td>1.2</td>
</tr>
<tr>
<td>Power (mW)</td>
<td>135.75</td>
<td>347.2</td>
<td>10.53l1</td>
<td>68.2</td>
<td>106</td>
<td>7.75l2</td>
<td>—</td>
<td>62.6</td>
<td>92.413</td>
<td>112.513</td>
<td>27.414</td>
</tr>
</tbody>
</table>

*aSiGe BiCMOS  *b1CMOS  *b21P7M Digital CMOS  *cLO can be tuned to cover wider range  *dWideband(W) or Narrowband(N) based on fractional B.W. handling capability of the architecture  *eTime-Delay based  *fFrequency channelization based  *g1beams per frequency/code  *g2Can support 4 beams/frequency if FFT used as phase shifter  *hLimited by DAC resolution or injection signal phase tuning  *iMeasured for a pair of elements, i.e., 2 channels  *jVdd whose power is being reported  *kPower/element for LO, buffers, mixers, summers and phase rotators  *lOnly I (real) channel LO, mixer, buffer and summer  *mLO power not included  *nLNA power included  *oBaseband (spectral filter + vector-combiner) consume only 135µW
For measurement (I), a single tone is placed on each bin, one bin at a time, and an on chip phase shift of $0^\circ$ is used. For this input scenario, as shown in Fig. 3.19(a), the maximum null depth observed is 27dB while the minimum is 23dB. The excellent matching between the bin outputs goes to show why beamforming post frequency slicing can mimic (within a certain tolerance) true-time-delay circuits. The matching is limited to local capacitor mismatch. For measurements (II) and (III), 2 tones are placed on two adjacent bins and two alternate bins respectively. Output (channel II) of one bin is phase shifted by $0^\circ$ while the other is phase shifted by $90^\circ$. A maximum null depth of 24dB is observed while the minimum null depth is 19dB as shown in Fig. 3.19(b) and 3.19(c) respectively. For measurement (IV), 3 tones are placed on three bins with phase shifts of $-90^\circ$, $0^\circ$ and $90^\circ$ respectively, as shown in Fig. 3.19(d). A maximum null depth of 25dB is observed while the minimum is 19dB.
For each case a −6dBm RF input signal was used. Having a constant power level ensured that the maximum amplitude in time be the same across all test patterns. A high RF power level was used to compensate for the absence of an LNA in our design. The sampler that precedes the CRAFT engine was identical to the one used in [55] which could operate till 5GS/s. However in the prototype design the sampling speed bottleneck was the output analog latch and the mixer bandwidth. Specifically the output latch did not allow us to go beyond a sampling rate of 120MS/s. But since the latch was used to merely facilitate testing, it is not a part of the generic architecture and hence the latch based bottleneck is inconsequential to the scheme. The main sources of computation error in the passive baseband are the parasitic wiring and voltage dependent capacitances, and the voltage dependent charge injection. A pseudo-differential implementation ensures that voltage-independent charge injection gets cancelled to first order. The major error contributor is the parasitic wiring capacitance ($C_{\text{wire}}$), which causes an error proportional to the ratio $\frac{C_{\text{wire}}}{C_{\text{wire}}+C_{\text{sample}}}$ (where $C_{\text{sample}}$ is the sampling capacitance). The voltage dependent capacitance error is a function of a similar ratio with $C_{\text{wire}}$ replaced with $C_{\text{voltage}}$. The wiring capacitance induced errors are systematic (deterministic) and can be calibrated out. For a 5GS/s 16-bit analog FFT implementation, post-calibration (digital) SFDR improved from 30dB to 47dB [74]. Since the current prototype employs an identical sampler and operates at much lower speed, the expected SFDR (post-calibration) is much higher. Moreover the null-depth has a high sensitivity to coefficient mismatch, hence significant improvement in null-depth is expected post-calibration as well.

A summary of and comparison with related works is presented in Table 3.2. For comparing power, only the power consumed in LO generation, mixers, baseband circuits (excluding output buffers), and phase rotators has been mentioned under related works. Apart from this work only [18] and [52] support simultaneous beams, and only [52] and [54] support wideband signals. Almost all the power consumption in our prototype is consumed by the LO, mixer and buffers which are generic RF front-end blocks while
the actual spectral filters (analog-FFT) and phase-shifters (vector-combiners) consume only 135µW. Therefore, the FFT size can be significantly increased to cover larger frequency ranges using a larger number of bins to provide a finer spectral as well as a finer spatial resolution (using the same FFT engine as a phase shifter). For example, a 16-point FFT using CRAFT was demonstrated in [55].

3.6 Summary

This work discusses and compares candidate architectures for a spatio-spectral beamforming receiver. Beamforming at RF is preferred when there are more number of antennas than frequency slices while for fewer antennas and more frequency slices IF beamforming is preferred.

The spatio-spectral architecture offers composite filtering advantage allowing for realization of full antenna gain even in the presence of a blocker if the blocker is frequency separated from the signal. An analog FFT [55] based spatio-spectral beamforming receiver front-end is presented. The scheme allows for simultaneous multi-directional steering of each frequency bin (slice) of the spectral filter. It can be used to increase target imaging resolution by enabling simultaneous multi-frequency target scanning. Moreover, it can be integrated within a single transceiver to process multiple carrier signals simultaneously. The entire analog baseband processing is performed using passive switched capacitors with a negligible power overhead, thus providing a distinct power advantage over LO/RF based schemes. Frequency discrimination at the baseband allows for accurate phase shifts at each carrier frequency.

An 8GHz-RF, 2-channel 4-frequency RX prototype is designed. To the best of the authors’ knowledge, this is the first on-chip analog spatio-spectral architecture implemented in silicon. The prototype has a measured null depths of 19dB or more with the analog baseband consuming only 135µW at 120MS/s. The prototype’s performance is summarized in Table 3.3. The measurement results demonstrate the spatio-spectral
beamforming capabilities of this architecture. Moreover, digital calibration is expected to significantly improve the null-depth [55].

Table 3.3: Summary of performance

<table>
<thead>
<tr>
<th></th>
<th>Max.</th>
<th>Min.</th>
</tr>
</thead>
<tbody>
<tr>
<td>No. of Channels</td>
<td>2</td>
<td></td>
</tr>
<tr>
<td>Independent freqs.</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td>RF power</td>
<td>27.4mW</td>
<td></td>
</tr>
<tr>
<td>Baseband power</td>
<td>135µW</td>
<td></td>
</tr>
<tr>
<td>Total power</td>
<td>27.5mW</td>
<td></td>
</tr>
</tbody>
</table>

Can be improved significantly using digital calibration [55]
Chapter 4

Wideband Phased Array: Time Domain Implementation

4.1 Introduction

Antenna arrays act as spatial filters by producing a pattern of maximas (points of maximum signal energy) and minimas (points of minimum signal energy) in space. To accomplish this they add phase shifts or time delays to signals on each individual antenna and subsequently sum all the signals. While, narrowband signals make no difference between timed arrays and phased arrays, wideband signals, signals with large fractional bandwidth \( f_{frac} \), assume different beam directions for timed arrays and phased arrays. This difference exists because phased arrays try to compensate for differences in the arrival times of signals at different antennas using fixed phase shifts, w.r.t. frequency, where as timed arrays do so using explicit time delays. For a single tone, a time delay or a phase shift are identical since a phase shift of \( \phi \) implicitly implies a time delay of \( \tau \) where \( \phi = 2\pi f_{sig} \tau \). But as the signal bandwidth grows, the same phase shift implies different time delays for different frequencies. This leads to the antennas perceiving different frequencies as coming in from different directions. This
phenomenon is referred to as beam squinting. Fig. 3.2 plots phase error vs. frequency for a carrier frequency of 6GHz and a fractional bandwidth of 0.5. The linear slope in blue represents the phase error when a fixed phase shift of 60° corresponding to the center frequency is applied across the entire bandwidth. A maximum phase error of 15° occurs at the extreme edges of the signal band (4.5GHz and 7.5GHz). The line in red corresponds to a true time delay, which as expected, incurs zero phase error across the bandwidth.

Since, by definition, all phased arrays with non-zero bandwidth suffer from beam squinting, application of phased arrays (here a difference is being drawn between phased arrays and timed arrays) depends on other factors such as cost (area) and functionality. A 60GHz WPAN link with data rates upwards of 5Gbps [76] is susceptible to even small squinting and hence requires a much greater phase accuracy as opposed to a slow data link. On the flip side, a military communication radar places emphasis on signal cancellation. While this can easily be accomplished, and with good spatial selectivity, through null pointing in phased arrays, a timed array solution would have much poorer spatial selectivity. In timed arrays, signals can be eliminated by aligning in time and subtracting but this method uses only a pair of antennas and hence has a poor spatial selectivity. Phased arrays on the other hand cancel signals by providing a progressive phase shift that aligns the signal components on a circle with an equiangular spacing of \( \frac{2\pi}{N} \) thus leading to complete signal cancellation (ideally) post summation since \( \sum_{i=1}^{N} e^{j i \frac{2\pi}{N}} = 0 \forall k \in \mathbb{Z} \). Note that since signal cancellation also requires precise phase shifting, eliminating a wideband signal runs into the same beam squinting issues as does phase aligning a wideband signal.

A brief introduction to different timed array implementations along with their advantages and drawbacks was given in [2]. The methods described previously were all in continuous time. Akin to discrete time vector modulation using charge redistribution for phased arrays, a discrete time solution to timed arrays is also feasible. In this
chapter one such design methodology is described in detail while two alternate implementa-
tions are discussed in brief. A proof-of-concept design has been discussed along with post-extraction simulation results (the chip is currently under fabrication).

4.2 Delay Sampling

In continuous time, timed arrays work by delaying the signals that arrive early (since a non-causal negative group delay can not be synthesized\footnote{It is possible to realize the transfer function $e^{\tau d}$ corresponding to a time advancing block but it can time / phase advance only apriori known signals [77].}) However in discrete time domain we can process signals on a first-come-first-served basis by reading information from signals that arrive early and appending it to the information read from the signals that arrive late. Here the term \textit{read} refers to sampling of the signal (either in voltage or current domain). The time delay or lag between the sampling of signals on adjacent antennas plays the role of progressive phase shifts / time delays. By equating this delay to $\tau = \frac{d \sin(\theta)}{c}$, where $d$ is the spacing between the antennas and $\theta$ is the angle of arrival w.r.t. broadside, the signals can be perfectly time aligned, i.e. identical information can be read off them. The idea becomes obvious when expressed mathematically

$$x(t-\tau)|_{t=T_s+\tau} = x(t)|_{t=T_s} = x(T_s)$$

where $T_s$ is the time at which the earlier arriving signal is read (sampled).

Since a time delay is being implemented through sampling, we refer to the scheme as \textit{delay sampling}. An initial (variable) delay common to all the antennas might be needed so as to perfectly align the reference channel’s sampling clock so as to capture most of the signal energy in pulsed radar applications. Fig. 4.1 shows different sampling schemes that realize delay sampling.

Scheme (a) is an RF voltage sampling scheme, either nyquist or sub-sampling. Although over sampling also works, and is in fact preferable, realizing nyquist sampling at RF itself is quite difficult owing to lack of sharp edged high frequency clocks. Aside from
Figure 4.1: Three candidate architectures for delay sampling implementation

noise folding and anti-alias filtering issues that plague all voltage sampling designs, the
biggest drawback of nyquist-sampling ADCs at RF is the enormous power consumption
at RF. As was discussed previously for the frequency slicing solution in the previous
chapter, digital offers maximum flexibility and computing power for signal processing
yet the implausibility of digitizing an RF signal with a decent enough SNR (> 6bit
ENOB), and low power makes, this scheme an entirely theoretical one. Sub-sampling
helps reduce the power consumption and clocking issues to an extent but requires sharp
band pass anti-aliasing filters, which are difficult to make, to avoid noise and unwanted
signals folding in band.

Scheme (b) realizes current-mode sub-sampling. An LNTA upfront acts like an RF $G_m$
cell. The current is integrated onto an LC tank. A time domain convolution operation, it essentially band pass filters the RF signal around the LC tank's impedance profile. Since current-sampling aliases, creates copies of the signal, after filtering, the bandpass filtered signal is then brought down to DC by a higher harmonic. This helps take care of anti-aliasing before sampling and also does some noise filtering, as happens in any current mode sampling scheme. After the integration period, the $G_m$ cell is disconnected from the tank and the inductor and capacitor are also disconnected. A decay path is provided for the inductor's current through capacitive coupling to the primary branch of a current mirror, while the mirrored current gets integrated onto a capacitor through the secondary branch. The capacitor meanwhile already has the voltage sampled onto it. Integrating the decaying current of the inductor across a capacitor preserves signal information and allows maximum capture of signal energy. In the next sampling phase, the discharged inductor and capacitor are connected back to the $G_m$ cell. Since an inductor can not sustain a dc voltage across it, after every cycle, the capacitor is discharged to ground potential. If differential input signals are available, a differential LNTA can be used instead and both ends of the inductor can be connected to the output common mode voltage during the discharge phase. Post sampling the circuit acts like an inductive converter of sorts and hence has an obvious speed issue. Other drawbacks include noise in the current integration path and appending information from inductor’s current to the capacitor’s voltage.

In this work, scheme (c) has been implemented which is a blend of mixing and sampling. Since sampling at RF frequencies is both difficult and wasteful (sampling rate is being set by the carrier frequency instead of signal bandwidth), scheme (c) achieves delay sampling at baseband. However in order to time align signal information perfectly at baseband, the phase difference at the LO frequency needs to be compensated for first. To better understand why this is the case, let us represent the incoming signal

\[ \text{The voltage from inductors current integration might be of an opposite sign compared to that directly sampled onto the capacitor.} \]
as $x_{BB}(t)e^{j\omega_c t}$, where $x_{BB}(t)$ is the baseband signal and $\omega_c$ is the carrier frequency. The signal on the adjacent antenna after a time delay of $\tau$ is $x_{BB}(t - \tau)e^{j\omega_c (t-\tau)}$. If all the signals are down converted using the same LO signal, $e^{-j\omega t}$ on all the channels, the baseband signal in the $k_{th}$ channel will be, $x_{BB}(t - (k - 1)\tau)e^{-j\omega_c (k-1)\tau}$. Note that the phase term, $e^{-j\omega_c (k-1)\tau}$, is independent of time and hence can not be removed through delay sampling. Therefore this phase term must be eliminated before sampling (in an alternate scheme is proposed that accomplishes both the tasks in the sampling phase itself). In the following prototype implementation, appropriately phase shifted LOs are used to accomplish the down conversion there by eliminating the static LO phase shift term. The implemented architecture is described next.

### 4.3 A Two-channel Timed Array Receiver

A two channel timed array receiver with a 2GHz instantaneous bandwidth with LO phases shifting and delayed baseband sampling using LO clock derivatives has been implemented in TSMC 65nm CMOS process with one thick metal and occupies an active area of 1.6mm$^2$ including probing pads (bonding pads and ESDs not counted in active area). The implemented prototype's architecture is shown in Fig. 4.2 while the layout of the receiver array is shown in Fig. 4.16.

#### 4.3.1 Architecture Description

The prototype is set up for RF input through probes while the baseband signals are bonded out to the PCB. Input signals are fed through a set of GSG pads to pseudo-differential noise canceling LNAs. A complementary QVCO is used to generate quadrature LO signals which are subsequently processed by a cartesian combiner to accomplish LO phase shifting. A current-bleeding gilbert cell with built in $2^{nd}$ order filtering down-converts the RF signals for baseband processing. Parallely, the phase shifted LO signals are divided by a chain of CMOS dividers to generate baseband sampling clock.
Figure 4.2: A 2-channel delayed sampling timed array receiver
current starved delay cell processes the sampling clocks to generate non-overlapping writing, reading and reset clocks all at the same frequency as the input clock. Current domain sampling is used to add signals at baseband which are then read out through source follower buffers.

### 4.3.2 Pseudo Noise Canceling LNA

A pseudo-differential LNA architecture was preferred over a differential LNA architecture owing to the difficulty in generating high frequency differential phase shifted test inputs. A NC LNA was selected owing to its pseudo-differential nature and good linearity in addition to its noise canceling properties. Fig. 4.3 shows the implemented LNA along with interfacing buffers. Unlike traditional NC LNAs that only use an inductor for source degeneration in the main path, inductors were also used at the output loads. NC LNAs balance the resistive loads in the ratio of the main path and auxiliary path’s

![Figure 4.3: Pseudo-differential NC LNA with output buffers](image)

$g_m$. However, at high frequencies the load has substantial capacitive component and the output impedance is a parallel of $R$ and $C$. If the $g_m$ ratios follow device sizes, the output poles ideally sit at the same frequency for the two paths. As this is not the case
for short channel devices, the output poles sit at different frequencies thus necessitating the need for tuning out load capacitance. Note that unlike shunt-peaking and series-peaking techniques, we are not trying to extend the amplifiers bandwidth. We are more interested in the matching the effective impedance ratio to the $g_m$ ratio. In fact a flat in-band gain is an important requirement if the design is to serve a wide instantaneous bandwidth. The series capacitor with the inductor is sized to offer minimum impedance in the desired frequency band.

An LC tank is used for source degeneration. Ideally the tail inductor should just tune out the gate-source capacitance of the main and the auxiliary paths. But since the center frequency of the LNA match needs to be in tune with the QVCO frequency, an additional capacitive tuning is provided at the tail to combat process variations and allow enough overlap with the QVCO frequency tuning. Note that the switches in the capacitor bank present a real impedance to the input and hence deteriorate the NF. The LNA outputs are processed by two 50 ohm source follower buffers which feed the output GSSG probings pads (to measure LNAs performance) and the mixers. The NF, gain and impedance match of the LNA are shown in Fig. 4.4 for post extraction simulations with buffer loading. The LNA consumes a current of 8.7mA from a 0.7V dc supply while the buffer consumes a current of 3.6mA from a 1V dc supply. The LNA has an IIP3 of 3.6dBm and a 1dB compression point of -7dBm.

### 4.3.3 Cartesian Combiner

The design of cartesian combiners was discussed in detail in [2]. In this design a $g_m$ switching cartesian combiner has been used. Each leg of the combiner has 5 $g_m$ units, Fig. 4.5 with a 3 bit control. While two MSB bits control a pair of $g_m$ units the LSB bit controls only one $g_m$ unit. In total eight different phases are realized through the following $g_m$ unit settings for I and Q weighing (I,Q): (5,0), (5,1), (5,2) (4,3) (3,4) (2,5) (1,5) and (0,5). These correspond to phase shifts of $0^\circ, 11.3^\circ, 21.8^\circ, 36.9^\circ, 53.1^\circ, 68.2^\circ, 78.7^\circ$ and $90^\circ$ respectively. An additional $90^\circ$ of phase rotation is achieved through swapping
Figure 4.4: NF, gain and impedance match of the LNA and buffer of I and Q differential lines with appropriate polarity. The swapping is achieved through a multiplexer at the output of the cartesian combiner (not shown in the figure). Post buffering, the cartesian combiner’s outputs are sent to the I-Q mixer and a divide by 4 circuit to produce the sampling clocks. A differential pair with source degeneration was used as the unit $g_m$ cell as shown in Fig. 4.6. In order to further improve the linearity of the $g_m$ cartesian combiner, unused $g_m$ units are switched off by grounding the tail sources as well the differential pair. The QVCO driving the cartesian combiner is an injection locked frequency multiplier, ILFM, ($\times 3$). Therefore the input to the cartesian combiner has an LO/3 component as well. A set of differential quadrature LO/3 signals are produced using a 3rd order on-chip PPF. Hence the LO/3 signals have worse quadrature accuracy than the LO signals. This affects the resultant phase shift at the LO frequency. To better understand this effect let us express the I and Q outputs of the LO as as
Figure 4.5: A 3-bit cartesian combiner with $g_m$ switching

$$LO_I = \cos(\omega_{LO}t) + \alpha \cos(\frac{\omega_{LO}}{3}t + \theta)$$

$$LO_Q = \cos(\omega_{LO}t + \frac{\pi}{2}) + \alpha \cos(\frac{\omega_{LO}}{3}t + \theta + \beta + \frac{\pi}{2})$$

$$= -\sin(\omega_{LO}t) - \alpha \sin(\frac{\omega_{LO}}{3}t + \theta + \beta)$$

where perfect quadrature at LO is assumed and that at LO/3 is assumed to be in error by $\beta$. Here $\alpha$ is the relative strength of the LO/3 signal at the output of the ILFM. LOI and LO + Q Assuming the fundamental and 3rd order coefficients of cartesian combiner’s $g_m$ as $g_{m,0}$ and $g_{m,2}$, we can express the cartesian combiner’s output current for a general (I,Q) code of $(k,l)$ as

$$I(LO_I) = k\left(g_{m,0}\cos(\omega_{LO}t) - 3g_{m,2}\alpha^3\cos(\omega_{LO}t + 3\theta)\right)$$

$$+ l\left(g_{m,0}\sin(\omega_{LO}t + \frac{\pi}{2}) + 3g_{m,2}\alpha^3\sin(\omega_{LO}t + 3\theta + 3\beta)\right)$$

$$= A_{gm,0}\sin(\omega_{LO}t + \phi) + B_{gm,2}\alpha^3\sin(\omega_{LO}t + \phi')$$
Figure 4.6: Unit $g_m$ cell for the cartesian combiner with switches

Quite clearly, $\phi$ and $\phi'$ are different because of, firstly, the $\beta$ phase error in the $LO/3$ signal and, secondly, the fact that while $LO_Q$ current from the original LO signal leads its $LO_I$ counterpart, the $LO_Q$ current from the $3^{rd}$ harmonic of the $LO/3$ signal lags its $LO_I$ counterpart from the $3^{rd}$ harmonic of the $LO/3$ signal.

Aside from introducing unwanted phase errors in the LO signal, the presence of harmonics also alters the zero crossings of the differential LO signal. This in turn leads to inaccurate phase shift to time delay conversion. If every harmonic present at the output had a phase shift proportional to its frequency, the phase shift to time delay conversion would have been accurate. But as was mentioned earlier, a cartesian combiner, being a true phase shifter, phase shifts all frequencies equally. Therefore further time delay errors creep in as the LO signal is hard limited for use with mixers and dividers.

In order to reduce the phase shift errors due to hard limiting, instead of driving CMOS inverter based buffers directly, a two stage CML buffers is interposed between the output of the cartesian combiner and the CMOS buffers as shown in Fig. 4.7. The CML buffer (4 in total, 2 per channel, 1 for each I and Q LOs) consumes a current of 2.3mA from a DC supply of 1V supply.

The cartesian combiner consumes a net current of 6-to-9mA (depending on the code)
from a 1V DC supply. A dummy cartesian combiner loading is provided to generate the LO signals for the reference channel. This is necessitated by the finite phase shift from the input to the output of the cartesian combiner. The dummy cartesian combiner is always set to (5,0) for the I LO path and (0,5) for the Q LO path. The post buffering LO phase shifts obtained from post extraction simulations are mentioned in Table. 4.1.

Figure 4.7: CML and CMOS buffer stage for cartesian combiners outputs

4.3.4 QVCO

A complementary QVCO architecture reported in [78] has been used to generate the required quadrature LO signals. The complementary QVCO, Fig. 4.8 shows superior noise and frequency tuning performance through in-built coupling path phase shift and reduced flicker noise in coupling MOSFETs. The QVCO is injection locked to an LO/3 signal through differential quadrature signals. The differential quadrature signals in turn are produced from a 3rd order PPF driven by an active balun. The active balun taps signals from the drain and source of a source degenerated common source amplifier.
Table 4.1: Extracted phase and delay values for different (I,Q) codes

<table>
<thead>
<tr>
<th>Code</th>
<th>Phase Shift(°)</th>
<th>Phase Error(°)</th>
<th>Time Delay(ps)</th>
<th>Time Error(ps)</th>
</tr>
</thead>
<tbody>
<tr>
<td>(5,0)</td>
<td>-0.14</td>
<td>-0.14</td>
<td>-0.3</td>
<td>-0.3</td>
</tr>
<tr>
<td>(5,1)</td>
<td>11.13</td>
<td>-0.18</td>
<td>3.76</td>
<td>-0.16</td>
</tr>
<tr>
<td>(5,2)</td>
<td>21.74</td>
<td>-0.06</td>
<td>7.74</td>
<td>0.17</td>
</tr>
<tr>
<td>(4,3)</td>
<td>36.97</td>
<td>0.10</td>
<td>12.57</td>
<td>-0.23</td>
</tr>
<tr>
<td>(3,4)</td>
<td>53.43</td>
<td>0.30</td>
<td>18.43</td>
<td>-0.02</td>
</tr>
<tr>
<td>(2,5)</td>
<td>68.65</td>
<td>0.45</td>
<td>23.85</td>
<td>0.17</td>
</tr>
<tr>
<td>(1,5)</td>
<td>79.26</td>
<td>0.47</td>
<td>27.68</td>
<td>0.36</td>
</tr>
<tr>
<td>(0,5)</td>
<td>90.30</td>
<td>0.30</td>
<td>31.45</td>
<td>0.20</td>
</tr>
</tbody>
</table>

The active balun’s outputs are buffered through self biased CMOS inverters and fed to the 3rd order PPF to generate differential quadrature locking signals.

4.3.5 Mixer

A double balanced gilbert cell mixer with current bleeding PMOS is used to down-convert the RF signal to baseband. Since the instantaneous bandwidth of the input signal could be as high as 2GHz in this design, the output pole should be placed at or beyond 1GHz. However doing so would reduce the suppression of the other mixing product which sits at $\omega_{RF} + \omega_{LO}$. For an RF frequency of 7GHz and an LO frequency of 8GHz, if the mixer output pole sits at 1GHz, the relative attenuation of the 15GHz component is 20dB. Therefore in order to further increase the attenuation of this unwanted component, a $2^{nd}$ order RC filter is used as shown in Fig. 4.9. The second order filter adds another 15dB of attenuation leading to a total attenuation of 35dB. Fig. 4.10 post extraction simulation gain, NF and linearity for the mixer. The mixer consumes a current of 2.6mA from a dc supply of 1V.
4.3.6 Divider

Two back to back D flip-flops, Fig. 4.11 with independent reset controls are used to
generate the sampling clock from the phase shifted LO signals. Although a CML divider
is preferable at high frequencies (8GHz), it runs into a phase inversion issue, i.e. the
divided down outputs in two adjacent channels might be $180^\circ + \phi$ out of phase instead
of being just $\phi$ out of phase. To better understand this phase inversion issue we need
to look at a CML divider as a 2 stage differential ring oscillator which can act like an
injection locked divider. Since the point of injection, the tail node is symmetric w.r.t
the differential output nodes, depending on which output node was at a higher potential
when the injection signal arrived, the differential output might have its polarity reversed
since from an injection locking viewpoint, both are equally valid.

A potential solution is to force an initial condition on the output nodes. If the two
dividers are designed to run in phase, i.e. a phase difference of $0^\circ$ instead of $180^\circ$, then
this scheme works fine. However if a finite phase difference is required then there exists
a certain order of inputs which can still lead to phase inversion even after resetting.
For example if one injection signal arrives before the other (there is a finite phase shift
between the two which is equivalent to saying one signal arrives later) and the two resets are released at the same time, one oscillator starts oscillating in sync with the injection signal while the other one oscillates independently till the injection signal arrives at which point the polarity set by the resetting action might have been reversed. Another scenario where resetting fails is when the two injection signals arrive simultaneously but one is positive and the other is negative. Although having independently controllable resets might work, in this design we have gone for a simple CMOS divider with single ended operation. Since a CMOS divider does not oscillate independently in the absence of a clock signal, it can be easily reset to be in sync with the clock. By sequentially resetting the dividers in order of their input frequency, the phase inversion problem can be completely eliminated. In this design, two 100fF capacitors are placed at the output node of two inverters connected back to back to delay the resetting of the slower divider w.r.t. the faster divider. The post extraction simulation based delay between the sampling clocks, derived from divided down phase shifted LO signals, for different (I,Q) code settings is mentioned in Table. 4.1.

4.3.7 Baseband

The sampling operation at baseband is accomplished through current domain charge sampling. A source degenerated inverter based $g_m$ cell integrates current onto the
Figure 4.10: Post extraction simulation gain, NF and linearity of the mixer

Figure 4.11: A CMOS ÷ by 4 with a delayed reset signal sampling capacitor in the write phase. There are 4 $g_m$ cells and 4 sampling capacitors per channel (one each for $\pm I$ and $\pm Q$ signals). In the read phase the charges on the sampling capacitors from the two channels are shared with the gate capacitance of a source follower output buffer stage as shown in Fig. 4.12. The read phase is followed by a small reset phase during which the sampling capacitors along with the gate of the source follower are connected to a common DC bias voltage. This DC bias voltage is chosen to be the output voltage of the $g_m$ cell with its input and output tied together. The same dc voltage is also used to bias the $g_m$ cells. Each $g_m$ cell (8 in total) consumes 750$\mu$A of current from a 2V supply. Fig. 4.13 shows the I and Q channel differential...
output of the buffers (post summation) for two different RF frequencies of 8.8GHz and 8.2GHz (RF input power level of -20dBm), an LO frequency of 8GHz respectively and a baseband load capacitance of 1pF. A time delay of 30ps was assumed between the two channels. The differential signals come close to zero after every reset period but because of the small reset window the voltage never settles to zero completely during this period. Differential clocks with dummy transistor loads are used to eliminate clock feed through in the switches. Note that the write signals for the two channels are different while a common read and reset signal is used. The generation of these signals is discussed in the next section.

4.3.8 Clock Generation

Three different non-overlapping clock phases are required for the sampling operation. Since these phases must run at the same speed as the input clock they can not be generated through an integer modulus counter followed by AND gating (or NAND gating) of output signals. They can however be generated directly from the 8GHz LO signal by using 2 synchronous D flip-flops connected back to back, Fig. 4.14. This will
Figure 4.13: Buffered baseband outputs for two different baseband frequencies
generate four outputs with a 90° progressive phase shift which can then be AND gated serially to generate 4 non-overlapping 25% duty cycle clocks at the desired frequency. However the duty cycle can not be changed.

In this design, current starved inverters are used as delay stages to generate time delayed versions of the input clock (@2GHz) which are then processed by NAND and NOR gates to generate the write and reset clocks respectively as shown in Fig. 4.15. The read signal is generated similar to the write signal with an inverted reference clock to produce the required delay between write and read phases. The width of write and read phases can be increased by increasing the current through the delay cells while that of the reset phase can be decreased by doing the same. The clock outputs are then buffered using appropriately sized inverters so as to ensure equal path delays to the sampling switches.

Figure 4.14: A synchronous CMOS ÷ by 4 using two back to back D flip-flops

4.3.9 Delay Extension

The sampling clock generation scheme described in the previous two sections works well for at most 4 antennas. Since the sampling clocks are generated from phase shifted LO signals the maximum delay between sampling clocks is limited to ~ T_{LO} corresponding to a phase shift of ~ 360°. Beyond this the LO phase wraps, for example 420° is the same as 60°. So even though the phase wrapped LO does compensate for the phase shift at LO frequency, it can not be used to generate the correct clock delay. The issue can

$^{3}$ All of read, write and reset use differently sized buffers since the final capacitive loads on the three lines are different.
be resolved using a DLL. Since the currently implemented scheme works well with two antennas, we can use the LO signals from two adjacent antennas to produce correctly delayed sampling clocks. The leading sampling clock is then fed to a chain of \( N+1 \) delay cells while the lagging clock is fed to a chain of \( N \) delay cells. The outputs of the delay cells are used as UP and DOWN controls of a charge pump as shown in Fig. 4.17. The charge pump’s capacitor sets the control voltage of the delay cell. In steady state the outputs of the two delay chains must be in phase thus making the delay of one stage equal to the delay between the two clock signals. This delay cell can then be used to generate the sampling clocks for the other channels. While the previous scheme ran into issues when generating larger than a certain progressive time delay, this scheme runs into the issues of not being able to produces delays smaller than the self delay of an inverter. Although the concept of a vernier delay can be used but a large number of chains are needed to generate the required set of delays. Instead an active RC delay stage can be used in the baseband to selectively delay some channels. For instance if the minimum achievable delay is 10ps but a step of 5ps is required then we delay every other channel by 5ps (using an active RC) and use the same sampling clock on
two consecutive channels. So in this example, channel 2, 4, 6 and so on will have a baseband delay of 5ps and channel 1 will be sampled at t=0ps, channels 2 and 3 at t=10ps, channels 4 and 5 at t=20ps and so on.

4.4 Summary

This chapter presents a discrete time domain solution for accurate timed array implementations. The proposed solution blends LO phase shifting based mixing with baseband sampling using clocks derived from the phase shifted LOs. Signals at different antennas are sampled in order of their arrival times. A proof-of-concept two channel timed array receiver prototype is discussed. The proposed scheme is scalable and more accurate than any continuous time timed-array implementation since no circuit based
approximation (T-line approximations using passive LC or active RC delay approximations) is being used to produce the delays. The accuracy of the proposed implementation is decided by mismatches, if any, between the LO and clock processing chains in the two channels. Further the scheme can be extended to multiple channels through a DLL circuit that takes two consecutive channels’ sampling clocks as an input. Since most receiver architectures demodulate the signal post digitization, they require some form of signal sampling and therefore the proposed discrete time domain scheme fits in well with most traditional receiver architectures. Table 4.2 summarizes some performance aspects of the receiver array.
Table 4.2: Performance summary (extracted) for the two channel timed array receiver

<table>
<thead>
<tr>
<th>Power Consumption(mW)</th>
<th>LNA (per channel): 6.1</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Mixer (I&amp;Q per channel): 5.2</td>
</tr>
<tr>
<td></td>
<td>Cartesian Combiner (per channel): 8</td>
</tr>
<tr>
<td></td>
<td>Baseband (per-channel): 6</td>
</tr>
<tr>
<td></td>
<td>QVCO (per-channel): 5</td>
</tr>
<tr>
<td></td>
<td>Divider (per-channel): 1.2</td>
</tr>
<tr>
<td>LNA+Mixer (NF)</td>
<td>5.1</td>
</tr>
<tr>
<td>Average Time Delay (ps)</td>
<td>4.53</td>
</tr>
</tbody>
</table>
Chapter 5

Injection Locking: Unified Model

5.1 Introduction

Injection locking can broadly be described as synchronization between events. A spring-mass (spring constant k and mass m) system when released from a compressed or expanded position on a friction-less surface oscillates at a natural frequency of $\omega = \frac{1}{2\pi} \sqrt{\frac{k}{m}}$. However if a periodic time varying force is applied to the system, the frequency of the system becomes equal to that of the periodic force. In other words, synchronization is achieved between the external force and the spring-mass system. Similarly two pendulums hanging on the same wall fall in sync eventually via coupling through mechanical vibrations within the walls [80]. Injection locking phenomenon is also exhibited in optical systems where high power, high precision, laser beams (referred to as slave lasers) are generated with very low noise levels through coupling with low noise, low power laser beam (locking signal referred to as the master or seed laser) [81]. Injection locking is widely used in frequency synthesizer design to generate clean (low noise) high power (amplitude) frequency signals from relatively clean low power input frequency signals. Fig. 5.1 shows some of the uses of, or phenomenons where, injection locking is used, or is exhibited. This chapter focuses on modeling of phase noise behavior in injection
locked oscillators from a circuits perspective but being mathematical in nature the results are equally valid for other injection locked systems. As will be seen later, a core component of modeling is down to interpreting and relating different contributors to the noise equation to different physical entities in the system.

Figure 5.1: Uses of injection locking

Synchronization of an oscillator to an external signal through injection-locking has been studied in literature [82, 83] and is currently used in low power dividers and for precision quadrature frequency generation [84–86]. The transient behavior of injection-locked oscillators (ILO) has been analyzed to better model its dynamics in [67, 87–89]. In this chapter we focus on modeling the phase noise of ILO within the full lock range.

Since the action of an ILO is analogous to that of a phase-locked loop (PLL), it is intuitive to model them similarly [90]. One analogy of particular interest is the ability of an ILO to replicate the input signal’s noise profile over a finite frequency bandwidth. The discrete building blocks in a PLL allow for the modeling of its selective filtering action of the source and the oscillator noise. However, unlike a PLL, the only physical entity in an ILO is the oscillator. This makes it difficult to obtain insight into the noise shaping in terms of the system parameters involved. Noise analysis of ILOs have previously been presented but have either a numerical focus [87, 91, 93] or has not captured the noise behavior over the entire lock range [94], hence providing little design insight.
This work aims to provide a simple, unified frequency domain model to analyze the noise shaping and filtering in an ILO, including sub-harmonic (i.e., in a ILFM) and super-harmonic injection (i.e., in a ILFD) scenarios. The analogy between ILOs and PLLs, and the extent of its validity have been explored. An intuitive understanding of the selective filtering actions in an ILO has been emphasized with the final result involving easily measurable / simulatable circuit parameters. We show that an ILO behaves exactly like a type-I first-order PLL. The time domain equations have been mapped onto the phase domain by introducing a noise filtering bandwidth parameter, $K_{ILO}$, which behaves like the $K_{VCO}$ in a PLL but varies across the lock range. Additionally, the noise-filtering bandwidth and its dependence on the injection frequency, injection signal level and lock range have been shown. Measurement results from three different designs: a discrete ILO, an integrated ILFD and an integrated ILFM realizations have been provided to validate the theoretical predictions.

This chapter is organized as follows: Section II provides the analysis and phase noise modeling of an ILO and validates the modified PLL analogy. Section III presents the measurement results and Section IV concludes this work by providing some important design insights.

5.2 ILO Phase Noise: A Small Signal Model

Time domain modeling of an injection locked oscillator (5.1) is well documented in the works of Paciorek and Adler [82,83]. While Adler’s work treats scenarios where the injection level (the ratio of the impressed signal to the desired output) is small, Paciorek’s treatment is more general in nature allowing for larger injection levels. The final difference between the two equations, though, is minimal differing only in the presence of the extra term $\varepsilon \cos[\phi(t)]$ in the denominator in Paciorek’s equation. In this work we choose Paciorek’s equation as our starting point owing to its generality. The derivation of the equation is not presented here. The interested reader is encouraged to
5.2.1 s-domain modeling

Under locked condition, when the two frequencies are equal, there is a fixed phase
difference, $\Phi_{ss}$, as shown in Fig. 5.2a, between the two signals. Assuming the net
disturbance in the steady state phase difference, $\Phi_{ss}$, due to all noise sources to be
$\Delta \phi(t)$ we can rewrite (5.1) as (5.2):

$$\frac{d[\Delta \phi(t)]}{dt} \cong \Delta \omega_{ij} - \Delta \omega_{free} - \frac{\omega_{free}}{2Q} \left[ \frac{\varepsilon}{1 + \varepsilon \cos(\Phi_{ss})} \right] \Delta \phi(t)$$

where $\Delta \phi(t)$ equals $\Delta \phi_{ij}(t) - \Delta \phi_{osc}(t)$ (Fig. 5.2a).

Eqn (5.2) contains only the incremental quantities since the steady-state quantities on
both sides of the equality reduce to zero. Here, $\Delta \phi_{ij}(t)$ and $\Delta \phi_{osc}(t)$ are the phase
perturbations of the injection signal and the oscillator signal respectively. Similarly, \( \Delta \omega_{inj}(t) \) and \( \Delta \omega_{\text{free}}(t) \) are the frequency perturbations of the injection signal and the free-running oscillator respectively. Note that these perturbations model the phase noise of the injection source and the stand-alone oscillator. Since it was assumed that the disturbances are small, all higher order terms in (5.2) have been neglected. Transforming both sides of (5.2) to the Laplace domain, we arrive at (5.3).

\[
\phi_{\text{osc}}(s) = \frac{K_{ILO}(\Phi_{ss})}{(s + K_{ILO}(\Phi_{ss}))} \phi_{\text{inj}}(s) + \frac{s}{(s + K_{ILO}(\Phi_{ss}))} \phi_{\text{free}}(s)
\]

(5.3)

where

\[
K_{ILO}(\Phi_{ss}) = \frac{\varepsilon^2 + \varepsilon \cos(\Phi_{ss}) \omega_{\text{free}}}{[1 + \varepsilon \cos(\Phi_{ss})]^2 2Q}
\]

(5.4)

where \( \phi_{\text{free}}(s) \) and \( \phi_{\text{inj}}(s) \) correspond to the phase noise of the free running oscillator, \( \mathcal{L}_{\text{osc}}(\Delta(\omega)) \), and the injection source, \( \mathcal{L}_{\text{inj}}(\Delta(\omega)) \), respectively. If the noise of the free-running oscillator and that of the injection source are uncorrelated, as is mostly the case, we can represent the resultant output phase noise, \( \mathcal{L}_{ILO}(\Delta(\omega)) \), of the system in terms of the phase noise of the injection source and the free-running oscillator as,

\[
\mathcal{L}_{ILO}(\Delta(\omega)) = \frac{(K_{ILO}(\Phi_{ss}))^2 \times \mathcal{L}_{\text{inj}}(\Delta(\omega)) + \Delta \omega^2 \times \mathcal{L}_{\text{osc}}(\Delta(\omega))}{\Delta \omega^2 + (K_{ILO}(\Phi_{ss}))^2}
\]

(5.5)

where \( \Delta \omega \) is the frequency offset at which the phase noise is calculated. Note that the noise of the free-running oscillator at this offset is not at the impressed frequency of oscillation but rather that at its natural frequency, \( \omega_{\text{free}} \). This behavior is unique to an

---

**Figure 5.2:** ILO frequency domain model about the steady state phase, \( \Phi_{ss} \)
ILO. Even though the noise behavior of a PLL is similar, the oscillator in a PLL can oscillate at the reference frequency (or a multiple of the reference frequency) even in the absence of the reference signal. In other words, the reference frequency (or a multiple of it) lies within the frequency tuning range (difference between maximum and minimum oscillation frequency) of the oscillator.

If the correlation between the injection source noise and free-running oscillator noise is non-zero, a simplistic derivation as above does not hold. In order to derive the resultant equations for a non-zero correlation case we rewrite the phase perturbation equation in time domain based on the s-domain equation (5.3):

$$
\phi_{osc}(t) = ke^{-kt} * \phi_{inj}(t) + \left( \delta(t) - ke^{-kt} \right) * \phi_{free}(t)
$$

(5.6)

where $k = K_{ILO}(\Phi_{ss})$. Re-arranging (5.6) we get

$$
\phi_{osc}(t) = \phi_{free}(t) + ke^{-kt} \ast (\phi_{inj}(t) - \phi_{free}(t))
$$

$$
= \phi_{free}(t) + k \int_{0}^{\infty} e^{-k\tau} [\phi_{inj}(t-\tau) - \phi_{free}(t-\tau)] d\tau
$$

Since phase noise is a wide sense stationary random process, we can compute the phase noise PSD by calculating the fourier transform of the auto-correlation of

$$
R_{osc}(\alpha) = E[\phi_{osc}(t)\phi_{osc}(t+\alpha)]
$$

(5.7)

After a few fairly straightforward, yet lengthy, integrals (Appendix [B]) we arrive at the following expression for the ILO’s phase noise PSD in presence of correlation between injection source noise and free-running oscillator noise:

$$
\mathcal{L}_{ILO}(\Delta\omega) = \frac{\Delta\omega^2 \cdot \mathcal{L}_{osc}(\Delta\omega)}{\Delta\omega^2 + K_{ILO}(\Phi_{ss})^2} + \frac{K_{ILO}(\Phi_{ss}) \cdot \mathcal{L}_{inj}(\Delta\omega)}{\Delta\omega^2 + K_{ILO}(\Phi_{ss})^2} - \frac{2\Delta\omega K_{ILO}(\Phi_{ss})}{\Delta\omega^2 + K_{ILO}(\Phi_{ss})^2} \int_{-\infty}^{\infty} R_{inj \ free}(\tau) \sin(\Delta\omega \tau) d\tau
$$

(5.8)

where $R_{inj \ free}(\tau)$ is the cross-correlation function between $\phi_{inj}(t)$ and $\phi_{free}(t)$. If the cross-correlation term in (5.8) is zero we get back (5.5).
5.2.2 ILO noise-filtering

Eqn (5.3) can be mapped on the equivalent block diagram shown in Fig. 5.2b. Note the similarity of this model to that of a type-I first order PLL. Here, the parameter, $K_{ILO}(\Phi_{ss})$, described in (5.4), is representative of the noise filtering 3-dB bandwidth of the ILO. It is similar to the $K_{VCO}$ in a PLL. Eqn (5.5) shows that, identical to a PLL, an ILO high pass filters the free-running oscillator’s noise while low pass filtering the injection source noise. However, here the $K_{ILO}(\Phi_{ss})$ changes from $\frac{\varepsilon}{1 + \varepsilon} \left(\frac{\omega_{\text{free}}}{2Q}\right)$ at the center of the lock to zero at the edges. The zero bandwidth at the edges can be explained intuitively as follows: at the edge of lock any perturbation causes the ILO to be temporarily outside the lock range and disconnected from the input. Fig. 5.3 shows the variation in the 3-dB noise filtering bandwidth normalized to its maximum value at the center, for two different injection levels. The $x$-axis has been normalized to the lock range. The figure shows a maximum noise filtering bandwidth at the center which drops down to zero at the edges as expected. However, the decrease in the filtering bandwidth across the lock range is not severe for the majority of the lock range. At an injection level of 0.25, the filtering bandwidth remains within 75% of its maximum value for about 75% of the lock range. This translates to a 3-dB degradation in the noise transfer function, as expressed in (5.5), at an offset of three quarters of the locking range from the center frequency. Additionally, the extent of the usable filtering bandwidth increases with increasing injection levels. Eqn (5.5) accurately predicts the shape of the noise spectrum at any offset from the carrier frequency. For a noiseless source, the shaping is completely determined by $K_{ILO}(\Phi_{ss})$ and the oscillator noise. However, as the source noise approaches the oscillator noise, the shaping across the lock range tends to decrease. Fig. 5.4 shows the noise profile of an ILO for $\varepsilon = 0.1$, at 1MHz offset, for different source phase noise levels. The oscillator has a free-running frequency of 2GHz, a Q of 20 and a phase noise of -100 dBC/Hz at 1MHz offset. From Fig. 5.4 one can infer that even when the source noise is well below that of the oscillator, the locked ILO phase
Figure 5.3: Filtering bandwidth vs. lock range

Figure 5.4: ILO noise shaping as a function of the source noise
Figure 5.5: Output phase noise profile for two different source noise profiles

noise does not degrade by more than 3 dB for more than 80% of the lock range. It also shows that as the source noise increases there is further flattening of the output phase noise. As a limiting case, if we had identical $L_{osc}$ and $L_{src}$, the noise spectrum would be flat across the lock range. Fig. 5.5 plots the normalized usable lock range (NULR) as a function of injection level and source phase noise. Here usability is defined in terms of $K_{ILO}(\Phi_{ss})$ and equates to the fraction of the lock range until which the resultant output phase noise is within 3 dB of its value at the center of lock. The free-running frequency, oscillator phase noise (@1MHz offset) and tank Q are assumed to be 2GHz, -100dBc/Hz and 20 respectively. It can be inferred from the plot that around 75 to 90% of the lock range is always usable for the simulated scenario. Fig. 5.6 shows the predicted (MATLAB simulations) output phase noise profile for a given free-running oscillator phase noise profile and two different injection source phase noise profiles. It can be inferred from the plots that at close in frequencies where $K_{ILO}(\Phi_{ss})$ dominates, the output noise follows the input noise profile where as at frequencies further away an averaging of the free-running and injection source noise brings the two noise profiles closer and at far off frequencies the free-running oscillator noise dominates making the two noise profiles identical in the far-off region. The free running frequency, injection level and $\Phi_{ss}$ for
5.2.3 Amplitude perturbation

In our earlier phase noise computations we had conveniently ignored amplitude changes (i.e. assumed $\varepsilon$ to be a constant). We could do so because of the inherent amplitude limiting in such a system which ensures minimal AM-to-PM noise conversion. However, if we look at a system of two or more coupled oscillators, the frequency of oscillation itself happens to be a function of $\varepsilon$. This is unlike a stand-alone ILO where the frequency of oscillation is defined by the source. Unlike an ILO, in a system of coupled oscillators there is no external source. For a given oscillator all other oscillators in the system that couple directly with the given oscillator can be defined as sources for the given oscillator. As an example consider a QVCO (quadrature voltage controlled oscillator) which is a system of two coupled (identical) oscillators which are out of phase by 90 degrees. Analyzing a QVCO from an ILO stand point one can immediately see that since $\Phi_{ss}$ is a strong function of $\varepsilon$, as $\varepsilon$ changes so does the oscillation frequency so as

\[ K_{ILO}(\Phi_{ss}) \]

In a stand alone ILO too, amplitude noise changes $K_{ILO}(\Phi_{ss})$ leading to a change in the output noise, however being an indirect effect it is less pronounced.
to maintain the phase quadrature relationship. For a QVCO, using equation (5.1), the oscillation frequency, at steady-state, is given by

\[ \omega_{osc} = \omega_{free} - \frac{\omega_{free}}{2Q} \frac{\varepsilon \sin(\Phi_{ss})}{1 + \varepsilon \cos(\Phi_{ss})} \]  

(5.9)

Differentiating (5.9) w.r.t. \( \varepsilon \) we get:

\[ \frac{d\omega_{osc}}{d\varepsilon} = -\frac{\omega_{free}}{2Q} \frac{\sin(\Phi_{ss})}{(1 + \varepsilon \cos(\Phi_{ss}))^2} \]  

(5.10)

Equation (5.10) predicts zero AM-PM conversion when operating at the center of lock and maximum AM-PM at \( \phi = \cos^{-1}\left(\frac{1 - \sqrt{1+8\varepsilon^2}}{2\varepsilon}\right) \) (however the maxima is never reached as \( \cos^{-1}\left(\frac{1 - \sqrt{1+8\varepsilon^2}}{2\varepsilon}\right) \) is greater than the steady state phase shift at the edge of lock, \( \phi_{max} = \cos^{-1}(-\varepsilon) \)). To gain more insight into the AM-to-PM dependence on steady state phase shift we compare a complementary QVCO architecture with a regular simple QVCO [78]. A complementary QVCO introduces a phase shift between its gate voltage and the current it injects into the coupling VCO which leads to a much smaller phase shift between the injection current and oscillator voltage as opposed to a 90\(^\circ\) phase shift in a regular QVCO. Fig. 5.7 plots the AM-to-PM scaling factor (5.10) for a regular and a complementary QVCO. The complementary QVCO has a steady state phase shift of 30\(^\circ\). The injection level for both the QVCOs is equal to 0.3 (estimated from schematic PSS simulations). It can be inferred from the plot that owing to a much smaller steady state phase difference, the complementary architecture suppresses any potential AM-to-PM conversion via the coupling path by \( \sim 10\text{dB} \) (20log3) when compared to a regular QVCO.

We can also refer to equation (5.8) to analyze phase-noise in a QVCO. Since the two oscillators in a QVCO have identical noise profile, \( \mathcal{L}_{ILO}(\Delta \omega) \) (output) and \( \mathcal{L}_{inj}(\Delta \omega) \) (input) must be the same leading to:

\[ \mathcal{L}_{ILO}(\Delta \omega) = \mathcal{L}_{osc}(\Delta \omega) - \frac{2K_{ILO}(\Phi_{ss})}{\Delta \omega} \int_{-\infty}^{\infty} \mathcal{R}_{ILO\ free}(\tau) \sin(\Delta \omega \tau) d\tau \]  

(5.11)

The last term arises owing to the dependence of the injecting VCO’s phase noise on the free-running oscillator’s noise of the injected upon VCO. However in the absence of
knowledge of the exact correlation nature and mechanism equation (5.11) does not lend any new design insight and is not pursued any further.

5.2.4 Extension to frequency multipliers and dividers: Model unification

Injection locked multipliers (ILFM) and injection locked dividers (ILFD) are also referred to as sub-harmonic and super-harmonic injection locked oscillators. Essentially device non-linearity (multipliers) or system symmetry (dividers) is utilized to lock the oscillator to either a sub-harmonic or a super-harmonic of the oscillation frequency. Therefore the treatment of injection locked oscillators in Section 5.2.1 can be easily extended to cover such injection scenarios. Since the reference phase (phase of the injection harmonic the oscillator locks on to) is different from that of the actual injection signal we replace $\phi_{inj}$ with $\eta \phi_{inj}$ where $\eta$ is greater than 1, equal to 1 and less than 1 for an ILFM, an ILO and an ILFD respectively, i.e. $\eta = \omega_{ilo}/\omega_{inj}$. Although not explicit in the equation, $\varepsilon$ is to be calculated w.r.t the $\eta_{th}$ harmonic of the injection signal. With these changes equation (5.2) can be rewritten as:

$$
\frac{d(\eta \Delta \phi_{inj}(t) - \Delta \phi_{osc}(t))}{dt} \approx \eta \Delta \omega_{inj} - \Delta \omega_{free} - \frac{\omega_{free}}{2Q} \left[ \varepsilon^2 + \varepsilon \cos(\Phi_{ss}) \right] \left( \eta \Delta \phi_{inj}(t) - \Delta \phi_{osc}(t) \right)
$$

(5.12)
Transforming both sides of the equality to s-domain we arrive at the generalized version of equation (5.3):

$$\phi_{osc}(s) = \frac{\eta K_{ILO}(\Phi_{ss})}{(s + K_{ILO}(\Phi_{ss}))} \phi_{inj}(s) + \frac{s}{(s + K_{ILO}(\Phi_{ss}))} \phi_{free}(s)$$  \hspace{1cm} (5.13)

which leads to

$$\mathcal{L}_{ILO}(\Delta \omega) = \frac{\eta^2 K_{ILO}(\Phi_{ss})^2 \mathcal{L}_{src}(\Delta \omega) + \Delta \omega^2 \mathcal{L}_{osc}(\Delta \omega)}{\Delta \omega^2 + K_{ILO}(\Phi_{ss})^2}$$  \hspace{1cm} (5.14)

### 5.2.5 ILOs vs. PLLs

As mentioned earlier, \((5.5)\) and \((5.14)\) are identical to that of a type-I first-order PLL \([92]\). The equivalence to a PLL works well for modeling the ILO noise profile. Identical to a PLL, an ILO shapes its noise by selectively high-pass filtering the oscillator noise and low-pass filtering the injection source noise. The noise-filtering bandwidth is dictated by \(K_{ILO}\), which is similar to the \(K_{VCO}\) in a type-I first-order PLL, as shown in Fig. \([5.2b]\). However, unlike a PLL, the ILO noise-filtering bandwidth is a function of the reference frequency and the reference signal level. This happens because, unlike the VCO in a PLL, the center frequency of the oscillator in an ILO is fixed. Another significant difference is the locking bandwidth in an ILO vs that in a PLL. The former is generally much larger, though a fair comparison is difficult since the PLL generally locks onto a much higher harmonic of the reference that does an ILO. In order to exploit this large locking bandwidth at the center of lock one can compare the phase difference between the input and output of an ILO (or an ILFD / ILFM) and use it to generate a frequency tuning voltage. Much like a PLL this tuning voltage can then be used to tune the ILO’s natural frequency to that of the injection signal. This forms the basis of dual-loop injection locked PLLs.

Having established mathematical expressions for the phase noise in ILOs, ILFDs and ILFMs we next validate these predictions with measurements from discrete bipolar and fabricated CMOS designs.
5.3 Measurement Results

We have used three test cases to validate the theoretical model derived in the previous section: an ILO, an ILFD, and an ILFM. The ILO was implemented using discrete components, while the divider and multiplier were implemented in IBM’s 0.13\(\mu\)m CMOS technology. We have measured the phase noise performance of the three circuits across the lock range and have compared it to our theoretical model. Phase noise of the injection source and free-running oscillator were also measured for theoretical predictions.

5.3.1 Injection-locked oscillator

The discrete ILO is a 57MHz Colpitts oscillator built using BJT Q2N3904 with base injection, as shown in Fig. 5.8. Fig. 5.9 shows the measured and predicted noise spectrum for this oscillator across a lock range of 245kHz, at an estimated injection level, \(\varepsilon\), of 0.175. The measurement results agree closely with the theoretical predictions as shown. The slight discrepancy in the curves is attributed to the inherent uncertainties associated with noise measurements and injection level estimation. The flat noise profile at the 1MHz offset is due to this offset falling well outside the filtering bandwidth, i.e.,
the injection signal has no effect.

![Graph showing phase noise (dBc/Hz) vs. normalized lock range](image)

**Figure 5.9: Colpitts based ILO noise at different offsets across the lock range**

### 5.3.2 Injection-locked frequency divider

The ILFD schematic and die photograph are shown in Fig. 5.10. Fig. 5.11 shows the measured and predicted phase noise results for the 6.5GHz ILFD at an estimated injection level, $\varepsilon$, of 0.074, for a lock range of 12.5MHz. A sinusoidal signal ($f_{inj}$) is converted into a square wave which is further pulse slimmed to reduce tones close to the fundamental. The 11th harmonic of this output is filtered by an LC bandpass filter and used as an injection signal at the tail current sources of a quadrature oscillator (Fig. 5.10). Consequently, the oscillator output frequency is $5.5f_{inj}$. The free-running VCO phase noise is estimated by observing the phase noise of the ILO (Fig. 5.12) locked at its center with a very small lock range [96].

As seen in Fig. 5.11, the measurements are in close agreement with the phase noise predicted by the divider model (5.14). The 3 dB discrepancy in the noise at 1MHz offset is attributed to the asymmetry in the 11th harmonic bandpass filter, shown in Fig. 5.10 which feeds the ILFD. The frequency division is accounted for by choosing $\eta$ to be 0.5 in the model. The phase noise of the 11th harmonic used for computation
Figure 5.10: A 0.13μm CMOS integrated ILFD operating at 6.5GHz

is derived from the measured source noise data (Fig. 5.12). Fig. 5.13 shows the noise profile of the unlocked and a locked ILFD. It clearly shows the free-running oscillator’s noise getting high-pass filtered by the ILO. Since the 10MHz offset falls outside the filtering bandwidth for the case depicted, the ILO noise at this offset tends to approach that of the free-running oscillator noise. The phase noise of the free-running oscillator, for all the test cases, is estimated from the ILO noise profile by injection-locking it at the center to a very small bandwidth as suggested in [96].

5.3.3 Injection-locked frequency multiplier

Fig. 5.14 shows the schematic and die photograph of an ILO used as a multiplier [4]. Fig. 5.15 shows the noise spectrum for the multiplier across a lock range of 140MHz, at an estimated injection level of 0.078. The output being observed is that of a down-conversion mixer being fed with a 24GHz RF signal and the ILO output, as shown in Fig. 5.14. The LO signal is generated by tripling the injection source frequency. An 8GHz input signal is hard limited and then fed to an ILO that locks onto the 3\textsuperscript{rd} harmonic of the input. The output of the ILO is then down-converted to 80MHz. The
RF signal phase noise was taken into account for our computations. The predicted noise profile is in close agreement with the measurements, as shown in Fig. 5.15. Again, the noise shaping as predicted by (5.14) is in agreement with the actual noise shaping observed over the lock range. The slight asymmetry in measurement can be attributed to the asymmetric lock range about the center frequency, arising due to the asymmetric series to parallel conversion of the parasitic resistance of the inductor.
Figure 5.13: Filtering action of the ILO (ILFD)

Figure 5.14: A subharmonic ILO (ILFM) operating at 24GHz
5.4 Summary

This chapter proposes, describes and validates a universal phase noise model for ILOs, ILFDs, and ILFMs, (5.14). The model is verified using measurement results from a 57MHz discrete Colpitts ILO, an integrated 6.5GHz CMOS ILFD, and an integrated 24GHz CMOS ILFM. Excellent matching between the theoretical predictions and measurements is demonstrated.

Additionally, it is shown that the phase noise behavior of an ILO is identical to that of a type-I first-order PLL (Fig 5.2b). In particular, we show that: An ILO, like a PLL, high-pass filters the oscillator’s noise while low-pass filtering the injection source noise. Though the extent of noise-filtering is a function of the operation frequency offset from the free-running frequency, a significant noise-filtering bandwidth is observed across much of the lock range. Consequently, good phase noise performance can be obtained from an ILO operating at significant frequency offsets (~75% or more of the lock range) from the free-running frequency. Also the simple frequency domain representation makes it possible to safely predict the noise-filtering bandwidth based on the VCO parameters as well as select an injection level to meet design targets.
Although identical in its frequency tracking action to a PLL, an ILO behaves differently in the presence of large perturbations. *Unlike a PLL* which undergoes only a frequency variation during the transient period, an ILO undergoes a change in both its oscillation amplitude and frequency. Additionally, the $K_{ILO}(\phi_{ss})$ which relates the ILO small signal model to that of a first-order PLL is dependent both on the amplitude ($\varepsilon$) and the operation frequency. This is *unlike a PLL*, where $K_{VCO}$ can be treated as a constant over much of its lock range, and for both small and large perturbations.
Chapter 6

A Differential Current Reuse Frequency Doubler

6.1 Introduction

Frequency multipliers are driven or autonomous circuits that take a single or multiple phase low frequency input and output one (single ended), two (differential) or four (differential quadrature) phases at a higher multiple of the input frequency \[98\]. They find extensive usage in frequency synthesizers, heterodyne transceivers and mm-wave circuit designs \[99,100\]. While they serve to increase the tuning range in low frequency synthesizers, they are almost the exclusive source of high frequencies \[101,102\]. Since the quality of the passives deteriorates rapidly as one moves upward in the mm-Wave frequency range, generating low noise signals itself becomes a challenge, let one attaining frequency tunability. While the quality of passives do degrade the power levels at the output of the frequency multipliers too, they are driven circuits, i.e. amplifiers and hence do not run into the same design issues as do autonomous oscillators. Although, they do need adequate input power levels to ensure minimal phase noise degradation. For the same reason they lend high frequency tunability to designs as well.
In homodyne duplex systems, where the TX and RX operate at the same time at slightly different frequencies, a strong TX LO signal can pull the LO signal of the RX and vice versa. Apart from inter RX-TX injection pulling they also serve to avoid self pulling and self mixing in homodyne transmitters and receivers respectively [103]. A heterodyne architecture is a multi step down conversion (up conversion) process that circumvents this pulling issue by using frequency multipliers / dividers to provide the actual LO signal while the autonomous oscillator works at a lower or higher frequency respectively [104,105]. As explained earlier generating low noise widely tunable signals is easier at lower frequencies and hence multiplication is preferred, even though dividers will necessarily exist in the feedback path of the PLL that incorporates the autonomous oscillator.

Frequency multipliers also serve to provide stable high frequency reference signals for high frequency oscillators which function outside a PLL loop. This avoids power hungry high frequency dividers with the same tunability advantage as was mentioned before. Note that, though here we are discussing injection locking an autonomous high frequency oscillator to a higher harmonic of the output of a low frequency PLL where as previous discussion was about using high frequency tuned amplifiers as frequency multipliers [1].

This chapter introduces a differential current reuse frequency doubler that reduces area and power consumption while generating a wideband frequency doubled output with a good conversion gain. The evolution of the differential architecture is described next followed by measurement results for the design prototype.

### 6.2 Differential Current-Reuse Architecture

Fig. 6.1 shows the a generic multiply-by-N frequency multiplier. N equispaced, phases of a low input frequency signal feed the N MOSFETs with their sources and drain tied together. Since all the transistors are connected in parallel and only gating signal is

---

1. High gain amplifiers consume more power than an oscillator at the same frequency, but has greater tunability.
enabled every $\frac{T_{in}}{N}$ seconds, where $T_{in}$ is the input clock period, the circuit perceives the output node senses the same input every $\frac{T_{in}}{N}$. This in turn leads to an output signal that repeats every $\frac{T_{in}}{N}$. The tuned load at the output further filters the signal to provide a strong $N_{th}$ harmonic. Even though non-overlapping clock phases are shown in Fig. 6.1, overlapping clock phases work as well. In fact any set of phases that make the input pattern repeat every $\frac{T_{in}}{N}$ would work, though the signal levels will be different. Non-overlapping phases would lead to a stronger $N_{th}$ harmonic by virtue of a lower duty cycle clock having a richer harmonic content than a higher duty cycle clock. The architecture in Fig. 6.1 has a PMOS counterpart too. To gain a simplistic insight into the functioning of the doubler let us consider each of the MOSFETs to be ideal switches. Therefore in the NMOS implementation, every time a input switch is turned on, the output gets shunted to ground or pulled low. Similarly in a PMOS implementation, as shown in Fig. 6.2 for a multiply by 2 circuit, every time a switch gets turned on, the output node gets shunted to Vdd or pulled up. In other words, during every switching event, a higher current is made to exit the output node in the NMOS implementation and a higher current is made to enter the output node in the PMOS implementation. Therefore if we make the current in the PMOS implementation equal that in its NMOS counterpart, we can sandwich the LC tank between a PMOS and an NMOS push-push pair. This can easily be achieved by tying the sources of the NMOS (or the PMOS)
to a tail current source thus leading to a current reuse differential frequency doubler. Notice that this sandwiching of the LC tank allows us to get differential outputs with

![Diagram of differential architecture](image)

**Figure 6.2: Evolution of the differential architecture**

a single LC tank where as previously we would have to use two LC tanks (one for the PMOS version and for the NMOS version). Additionally, the NMOS implementation can now reuse the current from the PMOS implementation thus leading to significant power savings (2X). Sandwiching the LC tank does lead to loss of some conversion gain owing to loss of $|V_{ds}|$ headroom for the MOSFETs though $|V_{gs}|$ strengths remain the same. Since the MOSFETs are in saturation during the differential pair switching, this only affects the current carrying capability of the MOSFET once the current has been steered entirely to one side, since now the MOSFET operates like a switch and hence is in linear region with a $V_{ds}$ dependent current. Nonetheless the area savings outweigh any potential loss in conversion gain. Notice that the input signal strength must be above a certain level for the multiplied output signal to exist. This can be understood
in two ways. Firstly, if we look at the system as an amplifier, it behaves linearly for low signal levels and hence the current change in one branch gets completely absorbed by the other leading to zero ac current flow through the load. Secondly, when looked at as a differential pair with current steering, the signal levels need to be higher than the switching threshold for current steering to occur.

Fig. 6.3 shows the implemented prototype’s circuit diagram along with biasing and sizing details. The die micrograph for the design is shown in Fig. 6.4. The design was implemented in IBM 65nm RF CMOS process with 2 thick metals and occupies an active area of 0.07mm$^2$. The switching pairs are dc biased by connecting their drains and gates together to maximize their $g_m$ for a given current while allowing the outputs to be dc biased at $\sim$ mid-supply. The center-tap of the tank inductor is ac grounded for differential balance. Notice that the differential outputs sit at slightly different dc values owing to the finite series resistance of the inductor. If the doubler outputs are processed directly by a CML buffer, this dc offset can be rejected by having a tuned load at the buffer output. If ac coupling capacitors are used instead, then this dc offset

\[\text{A dc voltage at this node was not possible since that would disturb the dc biasing of the circuit}\]
\[\text{This can be a low quality inductor and is only required to minimize the gain at dc}\]
does not cause any error since it can not propagate. An NMOS current source is used for biasing. The current source allows for lowering of the doubler power consumption at higher frequencies. A 3-bit capacitive tuning is provided to maximize the tuned load’s impedance across the desired frequency range. The doubler was designed to process the output of a wideband VCO whose oscillation ranges from 1GHz to 7GHz [106]. Thus the doubler should cover an output tuning range of 7GHz to 14GHz. In order to have a good ($\sim$)flat impedance across the tuning range, an overall hit is taken in the Q of the inductor since a high Q inductor, even though it would have a high conversion gain around its maximum Q would severely limit the output tuning range. Further degradation in the effective Q of the tank is incurred owing to the finite on-switch resistance of the capacitor bank switches. A frequency aware switch sizing technique was used. Frequency agnostic switch sizing maintains a constant RC product for the switches thus leading to progressively wider switches for higher capacitances. However since some capacitors are never turned on in certain frequency bands, the switches can be sized to maintain a constant $\omega \text{RC} (= 1/Q)$ product instead. This in turn relaxes witch sizing by lowering the switch size required for larger capacitances since they operate at a lower frequency. Frequency aware switch sizing helps reduce the off-state parasitics from the switches substantially [106].
Figure 6.5: Extracted conversion gain and LO suppression at band edge and center frequencies

6.3 Simulation and Measurement Results

The frequency doubler was part of a wide tuning frequency synthesizer and was fed directly from the output of the VCO. Therefore an independent conversion gain test of the doubler could not be conducted. Instead a post extraction simulation based conversion gain test was performed for the doubler. Fig. 6.5 shows the conversion gain and fundamental harmonic suppression of the doubler as a function of the input LO power. Since the output strength of the doubler is only a function of the tail current and impedance at the output frequency, we would expect the conversion gain to drop beyond a certain input voltage as no further reduction in switching time would occur beyond this input voltage. This trend is clearly visible in the extracted conversion gain simulations. Further, owing to the double balanced nature of the topology, excellent LO suppression is achieved. The degradation in LO suppression at the highest frequency is down to the poorly defined tank impedance at this frequency\(^4\). The improvement in the conversion gain with frequency is down to the Q improvement of the inductor with frequency. Fig. 6.6 shows the extracted conversion gain across the output frequency range of the doubler for a fixed dc power consumption of 6mW from a 1.5V supply. Also plotted alongside is the power consumption for a fixed conversion gain of -12dBm. The conversion gain and power trends are consistent with the previous simulation. Fig. 6.7

\(^4\) All capacitor bank switches are turned off
Figure 6.6: Extracted conversion gain and power consumption across the band reports the measured doubler phase noise at three different VCO frequencies (2.9GHz, 4.4GHz and 5.9GHz) respectively. The inset shows the phase noise at fixed frequency offsets of 1 and 10MHz respectively. The doublers phase noise values are approximately 6dBc away from the corresponding VCO phase noise as expected. When integrated with the VCO, VCO-doubler combination achieves (measured) FOM\(_T\) values between 191 to 209dBc/Hz. quite interestingly the doubler does not degrade the FOM\(_T\) of the stand alone VCO, instead it is able to maintain the same value by canceling the increase in power consumption through tuning range improvement in the FOM\(_T\) expression.

### 6.4 Summary

This chapter presents a current reuse frequency doubler. An LC tank is sandwiched between a PMOS and an NMOS push-push pair thus allows tapping of differential signals across the tank. A variable tail current source allows for optimal conversion efficiency-power trade off. The design is scalable and can easily be incorporated in mm Wave frequency synthesizers. The doubler achieves FOM\(_T\) values between 191 to 209 dBC/Hz.

\[
FOM_T = 20 \times \log_{10} \left( \frac{f_0}{\Delta f} \right) - 10 \times \log_{10} \left( P_{dc}(mW) \right) - \mathcal{L}(\Delta f) + 20 \times \log_{10} \left( \frac{FTR(\%)}{10} \right),
\]

where \( \mathcal{L} \) is the VCO Phase noise at frequency \( f_0 \) measured at a frequency offset of \( \Delta f \), \( P_{dc} \) is the dc power consumption in mW and \( FTR \) is the frequency tuning range (\( \frac{\text{bandwidth}}{\text{center frequency}} \)).
Figure 6.7: Measured phase noise at band center and edges while its phase noise varies between -114dBc/Hz to -112dBc/Hz when driven by a 2.9-5.8GHz VCO (VCO’s actual tuning range is from 0.7GHz-5.8GHz). The extracted gain of the doubler varies between -15dB to -11dB over the same bandwidth for a fixed dc power consumption of 6mW and an LO power level of 4dBm.
Chapter 7

Contributions & Future Work

7.1 Contributions

**Wideband Phased Arrays**: Two different solutions for wideband phased array implementations have been proposed. The 1st solution tackles the issue of wide instantaneous bandwidths in phased arrays through frequency channelization thus allowing for leveraging of diversity in space and frequency simultaneously. By dividing a wide bandwidth into smaller bandwidths, each narrow bandwidth can be processed independently by a dedicated phase shifter thus minimizing beam squinting. This further allows for independent and concurrent processing of multiple narrowband signals thus allowing for beamsteering of OFDM signals as well. The phase shifter and the filter are all passive and hence scalable with excellent matching thus leading to smaller phase errors. Further, the analog filter is frequency programmable thus lending additional tunability and filtering to the design.

A second design is proposed in discrete time domain that samples incoming signals in order of their arrival times. Sampling is accomplished at baseband to avoid sub-sampling attenuation at RF as well as avoiding unreasonable power consumption in RF Nyquist sampling (or oversampling) ADCs. To ensure perfect phase alignment, post sampling,
phase shifted LOs are used to cancel phase shifts at carrier frequency prior to sampling. Further, by deriving sampling clocks from the same phase shifted LO, much finer time delay resolutions can be obtained compared to DLLs. The proposed scheme provides accurate time delays since no approximations are used in realizing the delay. Although implemented for a two channel receiver, the scheme can easily be extended to multiple channels through the use of a DLL that uses two consecutive channel sampling clocks as inputs. The scheme can easily be integrated with an standard receiver that demodulates signals in the digital domain.

**Injection Locking:** A universal simplified phase noise analysis for, and modeling for, injection locked oscillators has been proposed. The analysis allows for closed form computation of noise filtering parameters in an ILO, ILFD and ILFM through easily simulatable (or measurable) circuit design variables leading to a phase noise model identical to that of a type-I PLL. The model predicts accurate low pass filtering of injection source noise and high pass filtering of the oscillator noise.

**Differential Current Re-use Frequency Doubler:** A differential current reuse frequency doubler design has been proposed. The design allows for use of a single LC tank sandwiched between a PMOS and an NMOS push-push pair to obtain differential signals using a single a LC tank. Additionally the current in the PMOS pair has been reused in the NMOS pair allowing for power savings.

### 7.2 Future Work

The frequency channelizer based wideband phased array uses a passive vector combiner through charge distribution at the baseband post sampling. An FFT based phase shifter has been proposed in chapter 3. The phase shifter can, independently, be used for beam steering in narrow-band phased arrays.

The discrete time domain architecture presented in this work can be extended to 4 or 8 channels using the delay extension scheme described in section 4.3.9 and as shown in
Figure 7.1: A conceptual diagram of two channel timed array receiver with baseband rotation and delay sampling

Fig. 4.17 Since delay resolution of a DLL based approach can not match that obtained through direct LO division, if the delay step required is smaller than DLL’s minimum step, we can generate sampling clocks from the LO by dividing it down just as has been done in this architecture. Note that we will not run into the phase wrapping issue over here owing to the progressive delay being small. The dividers in all but two channels can be shut down when the required delay can be generated by the DLL. The two on dividers will provide the reference clock and the delayed clock to the DLL.

Additionally, as was alluded to in chapter 4 and as is shown in Fig. 7.1 vector interpolation can be integrated with delay sampling at baseband. This avoids, power (or area) hungry LO cartesian combiners since the same LO is used to downconvert signals on all the channels. The signals from each channel are sampled onto sine / cosine weighted
capacitors and added with appropriate polarity to accomplish phase rotating. A delay can be introduced in the sampling clocks of successive channels to accomplish delay sampling at baseband as well. Since final signal combination as well as phase rotation is accomplished through charge sharing, both the steps can be carried out at the same time. The example case shown in Fig. 7.1 is for a two channel implementation. The delayed sampling clocks can be generated from a DLL. In order to generate progressive delays smaller than the resolution of the DLL, we can instead introduce an additional delay in the signal path at baseband using an active-RC delay stage as was mentioned in section 4.3.9. Note that since the exclusive delay would be introduced post downconversion this will not affect LO phase shifts and hence the baseband vector rotation. Since any charge sharing scheme suffers from attenuation, a small baseband scaling (gain) may be required before sampling.

In appendix A a second order linear time domain equation (A.14) for injection locked oscillators has been introduced (the nonlinear term can be ignored under slowly changing phase difference). The time domain model can be transformed into an s-domain model using which a more accurate phase noise analysis of injection locked oscillators can be performed.
References


Appendix A

Injection Locking Time Domain

Transient Equation: An Alternate Derivation

Fig. A.1 shows the diagram of an LC tank with a parallel loss resistance, a $g_m$ cell to compensate for the losses in the same and an external injection source. The values of the components are assumed to be $L$, $C$ and $R$ respectively while the value of other components (or voltages or currents) are as shown in the diagram. The derivations that follow consider KCL and KVL equations for this LCR circuit.

The following steps are followed in this derivation:

(a) Derive KCL / KVL equations for the circuit

(b) Express result in the form of $A + B \frac{d\phi}{dt} + C \frac{d^2\phi}{dt^2} + D \left( \frac{d\phi}{dt} \right)^2 = 0$

(c) Simplify $A$, $B$, $C$ & $D$

(d) Ignore the nonlinear term, $\left( \frac{d\phi}{dt} \right)^2$, to derive an expression for $\frac{d\phi}{dt}$. Verify the result against derivations of Adler [82] and Paciorek [83]

(e) Post result verification, re-solve for $\frac{d\phi}{dt}$ without ignoring the nonlinear term $\left( \frac{d\phi}{dt} \right)^2$
Figure A.1: A lossy LC tank with a $g_m$ cell and an injection source: conceptual diagram

$$I(t) = \frac{V \cos(\omega t + \phi(t))}{R} + C' \frac{dV \cos(\omega t + \phi(t))}{dt} + \int \frac{V \cos(\omega t + \phi(t))}{L} dt$$

$$= \alpha \cos(\omega t) + g_m V \cos(\omega t + \phi(t))$$

Differentiating the above expression w.r.t time we get

$$I'(t) = -\frac{V}{R} \left( \omega + \frac{d\phi}{dt} \right) \sin(\omega t + \phi(t)) - CV \left( \omega + \frac{d\phi}{dt} \right)^2 \cos(\omega t + \phi(t)) - CV \frac{d^2\phi}{dt^2} \sin(\omega t + \phi(t))$$

$$+ \frac{V}{L} \cos(\omega t + \phi(t)) = -\alpha \sin(\omega t) - g_m V \left( \omega + \frac{d\phi}{dt} \right) \sin(\omega t + \phi(t))$$

Rearranging the terms in the previous expression we arrive at the following effective expression

$$A + B \frac{d\phi}{dt} + C \frac{d^2\phi}{dt^2} + D \left( \frac{d\phi}{dt} \right)^2 = 0 \quad (A.1)$$

Where,

$$A = -\alpha \omega \sin(\omega t) + \omega V \left( 1 - \frac{g_m R}{R} \right) \sin(\omega t + \phi(t)) - \frac{V}{L} \left( 1 - \omega^2 LC \right) \cos(\omega t + \phi(t))$$

$$B = V \left( 1 - \frac{g_m R}{R} \right) \sin(\omega t + \phi(t)) + 2 \omega CV \cos(\omega t + \phi(t))$$

$$C = CV \sin(\omega t + \phi(t))$$

$$D = CV \cos(\omega t + \phi(t))$$

Let us define some parameters that will help us simplify the above expressions

$$\varepsilon = \frac{\alpha}{g_m V}; \quad 1 - \omega^2 LC = -2 \omega_0 \Delta \omega LC; \quad 2RC = \frac{2Q}{\omega_0}$$
Substituting the above in the expressions for \(A, B, C \& D\) we get

\[
A = g_m V \omega \left( -\varepsilon \sin(\omega t) + \frac{(1 - g_m R)}{g_m R} \sin (\omega t + \phi(t)) + \frac{2Q\Delta \omega}{\omega_0 g_m R} \cos (\omega t + \phi(t)) \right)
\]

\[
B = g_m V \omega \left( -\varepsilon \sin(\omega t) + \frac{(1 - g_m R)}{g_m R} \sin (\omega t + \phi(t)) + \frac{2Q\Delta \omega}{\omega_0 g_m R} \cos (\omega t + \phi(t)) \right)
\]

\[
C = g_m V \omega \left( \frac{Q}{\omega_0^2 g_m R} \sin (\omega t + \phi(t)) \right)
\]

\[
D = g_m V \omega \left( \frac{Q}{\omega_0^2 g_m R} \cos (\omega t + \phi(t)) \right)
\]

Since we are interested in \(\phi(t)\) we need to further expand \(\cos (\omega t + \phi(t))\) and \(\sin (\omega t + \phi(t))\). Doing so leads to the following expressions for \(A, B, C\) \& \(D\).

\[
A = \left( -\varepsilon + \frac{(1 - g_m R)}{g_m R} \cos (\phi(t)) - \frac{2Q\Delta \omega}{\omega_0 g_m R} \sin (\phi(t)) \right) \sin(\omega t)
\]

\[
+ \left( \frac{(1 - g_m R)}{g_m R} \sin (\phi(t)) + \frac{2Q\Delta \omega}{\omega_0 g_m R} \cos (\phi(t)) \right) \cos(\omega t)
\]

\[
B = \left( \frac{(1 - g_m R)}{\omega g_m R} \cos (\phi(t)) - \frac{2Q}{\omega_0 g_m R} \sin (\phi(t)) \right) \sin(\omega t)
\]

\[
+ \left( \frac{(1 - g_m R)}{g_m R} \sin (\phi(t)) + \frac{2Q\Delta \omega}{\omega_0 g_m R} \cos (\phi(t)) \right) \cos(\omega t)
\]

\[
C = \left( \frac{Q}{\omega_0^2 g_m R} \cos (\phi(t)) \right) \sin(\omega t) + \left( \frac{Q}{\omega_0^2 g_m R} \sin (\phi(t)) \right) \cos(\omega t)
\]

\[
D = \left( -\frac{Q}{\omega_0^2 g_m R} \sin (\phi(t)) \right) \sin(\omega t) + \left( \frac{Q}{\omega_0^2 g_m R} \cos (\phi(t)) \right) \cos(\omega t)
\]

We can further rewrite \(A + B \frac{d\phi}{dt} + C \frac{d^2 \phi}{dt^2} + D \left( \frac{d\phi}{dt} \right)^2 = 0\) as \(x(t) \sin(\omega t) + y(t) \cos(\omega t) = 0\).

Since the expressions must hold for all time \(t\), both \(x(t)\) and \(y(t)\) must individually equal zero for all times \(t\). In order to analyze the expression in a orderly manner, we first neglect \(D\). We can then write \(x(t)\) and \(y(t)\) as

\[
x(t) = \left( -\varepsilon + \frac{(1 - g_m R)}{g_m R} \cos (\phi(t)) - \frac{2Q\Delta \omega}{\omega_0 g_m R} \sin (\phi(t)) \right)
\]

\[
+ \left( \frac{(1 - g_m R)}{g_m R} \cos (\phi(t)) - \frac{2Q}{\omega_0 g_m R} \sin (\phi(t)) \right) \frac{d\phi}{dt} + \left( \frac{Q}{\omega_0^2 g_m R} \cos (\phi(t)) \right) \frac{d^2 \phi}{dt^2} \quad (A.2)
\]
\[ y(t) = \left( \frac{1 - g_m R}{g_m R} \sin (\phi(t)) + \frac{2 Q \Delta \omega}{\omega_0 g_m R} \cos (\phi(t)) \right) \]
\[ + \left( \frac{1 - g_m R}{\omega g_m R} \sin (\phi(t)) + \frac{2 Q}{\omega g_m R} \cos (\phi(t)) \right) \frac{d\phi}{dt} + \left( \frac{Q}{\omega_0^2 g_m R} \sin (\phi(t)) \right) \frac{d^2 \phi}{dt^2} \tag{A.3} \]

Setting \( y(t) = 0 \) allows us to solve for \( \frac{d^2 \phi}{dt^2} \) to get
\[ \frac{Q}{\omega_0^2 g_m R} \frac{d^2 \phi}{dt^2} = -\frac{1}{\sin (\phi(t))} \left[ \left( \frac{1 - g_m R}{g_m R} \sin (\phi(t)) + \frac{2 Q \Delta \omega}{\omega_0 g_m R} \cos (\phi(t)) \right) \right] + \]
\[ \frac{-1}{\sin (\phi(t))} \left[ \left( \frac{1 - g_m R}{\omega g_m R} \sin (\phi(t)) + \frac{2 Q}{\omega g_m R} \cos (\phi(t)) \right) \frac{d\phi}{dt} \right] \]

This implies
\[ x(t) = \left( -\varepsilon + \frac{(1 - g_m R)}{g_m R} \cos (\phi(t)) - \frac{2 Q \Delta \omega}{\omega_0 g_m R} \sin (\phi(t)) \right) \]
\[ + \left( \frac{(1 - g_m R)}{\omega g_m R} \cos (\phi(t)) - \frac{2 Q}{\omega_0 g_m R} \sin (\phi(t)) \right) \frac{d\phi}{dt} - \frac{\cos (\phi(t))}{\sin (\phi(t))} \left[ \left( \frac{1 - g_m R}{g_m R} \sin (\phi(t)) + \frac{2 Q \Delta \omega}{\omega_0 g_m R} \cos (\phi(t)) \right) \right] \]
\[ - \frac{\cos (\phi(t))}{\sin (\phi(t))} \left[ \left( \frac{1 - g_m R}{\omega g_m R} \sin (\phi(t)) + \frac{2 Q}{\omega_0 g_m R} \cos (\phi(t)) \right) \right] \frac{d\phi}{dt} \]

On further simplification of above we get
\[ x(t) = -\varepsilon - \frac{2 Q \Delta \omega}{\omega_0 g_m R} \sin (\phi(t)) - \frac{2 Q}{\omega_0 g_m R} \sin (\phi(t)) \frac{d\phi}{dt} - \frac{2 Q \Delta \omega \cos^2 (\phi(t))}{\omega_0 g_m R \sin (\phi(t))} \]
\[ - \frac{2 Q \cos^2 (\phi(t))}{\omega_0 g_m R \sin (\phi(t))} \frac{d\phi}{dt} \]
which can be rewritten as
\[ \frac{d\phi}{dt} = -\Delta \omega - g_m R \varepsilon \frac{\Delta \omega}{2 Q} \sin (\phi(t)) \tag{A.4} \]

Although \( \text{(A.4)} \) gives an accurate 1st order expression for \( \frac{d\phi}{dt} \) we can further simplify it by finding an alternate expression for \( g_m R \). Notice that this product’s average value is 1 for an autonomous oscillator in steady state. We revisit \( \text{(A.2)} \) and \( \text{(A.3)} \), however this time around we neglect \( C \) as well. Now we solve for \( \frac{d\phi}{dt} \), (refap5), by setting \( x(t) = 0 \).
We obtain another expression for $\frac{d\phi}{dt}$, (refap6), by setting $y(t) = 0$. Since these two expressions must match, equating them leads to (A.7)

$$
\frac{d\phi}{dt} = -\left(-\varepsilon + \frac{(1-g_m R)}{g_m R} \cos (\phi(t)) - \frac{2Q\Delta\omega}{\omega_0 g_m R} \sin (\phi(t))\right)
$$

(A.5)

$$
\frac{d\phi}{dt} = -\left(\frac{(1-g_m R)}{g_m R} \sin (\phi(t)) + \frac{2Q\Delta\omega}{\omega_0 g_m R} \cos (\phi(t))\right)
$$

(A.6)

$$
\varepsilon \sin (\phi(t)) (g_m R)^2 - \left(\varepsilon \sin (\phi(t)) + 2Q [1 + \varepsilon \cos (\phi(t))]\right) g_m R + 2Q = 0 \quad (A.7)
$$

Equation (A.7) is a quadratic in $g_m R$ and as such solvable, leads to a complicated alternate expression for (A.4). Instead, we ignore the quadratic term to obtain

$$
g_m R \approx \frac{2Q}{\varepsilon \sin (\phi(t)) + 2Q [1 + \varepsilon \cos (\phi(t))]} \quad (A.8)
$$

Substituting the expression from (A.8) back into (A.4) we arrive at the following expression for $\frac{d\phi}{dt}$

$$
\frac{d\phi}{dt} = -\frac{\omega_0 - \frac{\varepsilon \sin (\phi(t))}{2Q [1 + \varepsilon \cos (\phi(t))]}\right)}{\omega_0 g_m R} \quad (A.9)
$$

Equation (A.9) is identical to Paciorek’s equation [83]. Further simplification of the same, by ignoring $\varepsilon \cos (\phi(t))$ in the denominator, leads to Adler’s equation [82]. Here it must be emphasized that quite a few approximations have been made to arrive at the above equation. Since the result is in agreement with the observations of Adler and Paciorek, it serves as an indirect verification of our approach. We can now derive a more thorough expression by including the previously neglected term, D, in our solution.

Without neglecting D, we solve for $y(t) = 0$ to get an expression for $\frac{d\phi}{dt}$ as

$$
\frac{1}{\omega_0^2 g_m R} \left(\frac{d\phi}{dt}\right)^2 = \frac{1}{\cos (\phi(t))} \left(\frac{(1-g_m R)}{g_m R} \sin (\phi(t)) + \frac{2Q\Delta\omega}{\omega_0 g_m R} \cos (\phi(t))\right)
$$

$$
+ \frac{1}{\cos (\phi(t))} \left(\left(\frac{(1-g_m R)}{g_m R} \sin (\phi(t)) + \frac{2Q}{\omega_0 g_m R} \cos (\phi(t))\right) \frac{d\phi}{dt} + \frac{Q}{\omega_0^2 g_m R} \sin (\phi(t)) \frac{d^2\phi}{dt^2}\right)
$$

(A.10)
Substituting the above expression into $x(t) = 0$ and simplifying we arrive at the following

$$x(t) = -\varepsilon g_m R \cos(\phi(t)) + 1 - g_m R + \left(1 - \frac{g_m R}{\omega}\right) \frac{d\phi}{dt} + \frac{Q}{\omega_0^2} \frac{d^2\phi}{dt^2} = 0 \quad (A.11)$$

Equation (A.11) can be rewritten as

$$\frac{d^2\phi}{dt^2} + \frac{\omega_0^2}{\omega Q} (1 - g_m R) \frac{d\phi}{dt} = \frac{\omega_0^2}{Q} g_m R [1 + \varepsilon \cos(\phi(t))] - 1 \quad (A.12)$$

Equation (A.12) is a more complete time domain transient equation, when compared with Adler or Paciorek, for injection locked oscillators under the assumption of a steady oscillation envelope (Adler and Paciorek make the same assumption since they treat $\varepsilon$ as a constant). By expressing $g_m R$ in terms of $\Delta\omega$, $\varepsilon$ and other factors we can effectively obtain a second order differential equation with noise terms from the injection source, the free running oscillator and that of the effective output thus allowing us to model noise in ILOs even more accurately. This expression for $g_m R$ may be derivable from power or phase balance at steady state. However, the same has not been derived here.

We can also express $g_m R$ in terms of $\frac{d\phi}{dt}$ using equation (A.4) as

$$g_m R = \frac{-2Q}{\varepsilon \omega_0} \left(\frac{d\phi}{dt} + \Delta\omega\right) \frac{1}{\sin(\phi(t))} \quad (A.13)$$

Substituting the expression from $g_m R$ into equation (A.12) we get, after an initial simplification, the following non-linear second order equation

$$\frac{d^2\phi}{dt^2} + \frac{d\phi}{dt} \left[\frac{\omega_0^2}{\omega Q} + \frac{2\Delta\omega \omega_0}{\omega \varepsilon \sin(\phi(t))} + 2\omega_0 \left(1 + \frac{\varepsilon \cos(\phi(t))}{\varepsilon \sin(\phi(t))}\right)\right] + \frac{2\omega_0}{\omega \varepsilon \sin(\phi(t))} \left(\frac{d\phi}{dt}\right)^2 + \frac{\omega_0^2}{Q} + 2\omega_0 \Delta\omega \frac{1 + \varepsilon \cos(\phi(t))}{\varepsilon \sin(\phi(t))} = 0 \quad (A.14)$$

As a sanity check if we were to ignore the second order term and the nonlinear term we would get the following expression for $\frac{d\phi}{dt}$

$$\frac{d\phi}{dt} = -\left[\Delta\omega + \frac{\omega_0}{2Q} \left(1 + \varepsilon \cos(\phi(t))\right)\right] \frac{\varepsilon \sin(\phi(t))}{1 + \frac{\varepsilon \sin(\phi(t))}{1 + \varepsilon \cos(\phi(t))}}$$
This expression is identical to (A.9) if we ignore the small extra term, $\frac{1}{\pi Q} \left( \frac{\varepsilon \sin(\phi(t))}{1 + \varepsilon \cos(\phi(t))} \right)$, in the denominator thus verifying the above derivation.

If we ignore the nonlinear term in (A.14), $\left( \frac{d\phi}{dt} \right)^2$, we get a second order linear differential equation which can easily be transformed into an s-domain equation much like what was done for our initial ILO phase noise model development in chapter 5. The model development has not been done in this work and is deferred to future work (chapter 7).
Appendix B

ILO Phase Noise for Correlated Input and Output Noise

The PSD of a wide sense stationary random process is the fourier transform of its auto-correlation function. We begin with the following time domain equation describing the output phase perturbation in terms of free-running oscillator’s phase perturbations and the injection signal’s phase perturbation.

\[ \phi_{osc}(t) = \phi_{free}(t) + k \int_{0}^{\infty} e^{-k\tau} [\phi_{inj}(t - \tau) - \phi_{free}(t - \tau)] d\tau \quad (B.1) \]

where \( k = K_{ILO}(\Phi_{ss}) \).

From (B.1) we have

\[ \phi_{osc}(t) \phi_{osc}(t + \alpha) = \phi_{free}(t) \phi_{free}(t + \alpha) + k^2 \int_{0}^{\infty} \int_{0}^{\infty} e^{-k(\tau_1 + \tau_2)} [\phi_{inj}(t - \tau_1) - \phi_{free}(t - \tau_1)] \times \]

\[ \times [\phi_{inj}(t + \alpha - \tau_2) - \phi_{free}(t + \alpha - \tau_2)] d\tau_1 d\tau_2 \]

\[ + k \int_{0}^{\infty} e^{-k\tau} [\phi_{inj}(t + \alpha - \tau) \phi_{free}(t) - \phi_{free}(t + \alpha - \tau) \phi_{free}(t)] d\tau \]

\[ + k \int_{0}^{\infty} e^{-k\tau} [\phi_{inj}(t - \tau) \phi_{free}(t + \alpha) - \phi_{free}(t - \tau) \phi_{free}(t + \alpha)] d\tau \quad (B.2) \]
Taking the expectation of both the sides of the equality in (B.2) we get

\[ R_{osc}(\alpha) = E[\phi_{osc}(t)\phi_{osc}(t + \alpha)] = R_{free}(\alpha) + \]
\[ k^2 \int_0^\infty \int_0^\infty e^{-k(\tau_1 + \tau_2)} [R_{inj}(x) + R_{free}(x) - R_{inj \ free}(-x) - R_{inj \ free}(x)] \]
\[ + k \int_0^\infty e^{-k\tau} [R_{inj \ free}(\alpha - \tau) - R_{free}(\alpha - \tau) + R_{inj \ free}(-(\alpha + \tau)) - R_{free}(\alpha + \tau)] d\tau \]

(B.3)

where \( x = \alpha + \tau_1 - \tau_2 \). Taking fourier transform on both sides of the equality on (B.3) we get

\[ S_{osc}(f) = S_{free}(f) - \frac{k^2}{k^2 + \omega^2} S_{free}(f) + \frac{k^2}{k^2 + \omega^2} S_{inj}(f) \]
\[ - k^2 \int_0^\infty \int_0^\infty e^{-k(\tau_1 + \tau_2)} \int_{-\infty}^\infty [R_{inj \ free}(x) + R_{inj \ free}(-x)] e^{-j2\pi f \alpha} d\alpha d\tau_1 d\tau_2 \]
\[ + k \int_0^\infty e^{-k\tau} \int_{-\infty}^\infty [R_{inj \ free}(\alpha - \tau) + R_{inj \ free}(-(\alpha + \tau))] e^{-j2\pi f \alpha} d\alpha d\tau \]

(B.4)

Evaluating the integrals w.r.t. \( \alpha \) in (B.5) first we get

\[ S_{osc}(f) = \frac{\omega^2}{k^2 + \omega^2} S_{free}(f) + \frac{k^2}{k^2 + \omega^2} S_{inj}(f) \]
\[ - k^2 \int_0^\infty \int_0^\infty e^{-k(\tau_1 + \tau_2)} \int_{-\infty}^\infty [S_{inj \ free}(-f) + S_{inj \ free}(f)] e^{j2\pi f (\tau_1 - \tau_2)} d\tau_1 d\tau_2 \]
\[ + k \int_0^\infty e^{-k\tau} \int_{-\infty}^\infty [S_{inj \ free}(f) e^{-j2\pi f \alpha \tau} + S_{inj \ free}(-f) e^{j2\pi f \alpha \tau}] d\tau \]

(B.5)

Solving the outer integrals in (B.6) we get

\[ S_{osc}(f) = \frac{\omega^2}{k^2 + \omega^2} S_{free}(f) + \frac{k^2}{k^2 + \omega^2} S_{inj}(f) + \frac{-k^2}{k^2 + \omega^2} [S_{inj \ free}(-f) + S_{inj \ free}(f)] \]
\[ + \frac{k}{k + j2\pi f} S_{inj \ free}(f) + \frac{k}{k - j2\pi f} S_{inj \ free}(-f) \]

(B.6)
Simplifying (B.6) further we get

\[ S_{osc}(f) = \frac{\omega^2}{k^2 + \omega^2} S_{free}(f) + \frac{k^2}{k^2 + \omega^2} S_{inj}(f) + \frac{j\omega k}{k^2 + \omega^2} \left[ S_{inj\ free}(-f) - S_{inj\ free}(f) \right] \]

(B.7)

\[ = \frac{\omega^2}{k^2 + \omega^2} S_{free}(f) + \frac{k^2}{k^2 + \omega^2} S_{inj}(f) - \frac{2\omega k}{k^2 + \omega^2} \int_{-\infty}^{\infty} R_{inj\ free}(\tau) \sin(\omega \tau) d\tau \]

(B.8)