## High-Performance Multi-Antenna Wireless for 5G and Beyond

## Mahmood Baraani Dastjerdi

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences

### COLUMBIA UNIVERSITY

2020

©2020 Mahmood Baraani Dastjerdi All Rights Reserved

## ABSTRACT

### High-Performance Multi-Antenna Wireless for 5G and Beyond

### Mahmood Baraani Dastjerdi

Over the next decade, multi-antenna radios, including phased array and multiple-inputmultiple-output (MIMO) radios, are expected to play an essential role in the next-generation of wireless networks. Phased arrays can reject spatial interferences and provide coherent beamforming gain, and MIMO technology promises to significantly enhance the system performance in the coverage, capacity, and user data rate through the beamforming or diversity/capacity gain which can substantially increase the range in wireless links, that are challenged from the transmitter (TX) power handling, receiver (RX) noise perspectives and a multi-path environment. Furthermore, the multi-user MIMO (MU-MIMO) can simultaneously serve multiple users which is vital for femtocell base stations and access points (AP).

Full-duplex (FD) wireless, namely simultaneous transmission and reception at the same frequency, is an emerging technology that has gained attention due to its potential to double the data throughput, as well as provide other benefits in the higher layers such as better spectral efficiency, reducing network and feedback signaling delays, and resolving hiddennode problems to avoid collisions. However, several challenges remain in the quest for the high-performance integrated FD radios. Transmitter power handling remains an open problem, particularly in FD radios that integrate a shared antenna interface. Secondly, FD operation must be achieved across antenna VSWR variations and a changing EM environment. Finally, FD must be extended to multi-antenna radios, including phased array and multi-input multi-output (MIMO) radios, as over the next decade, they are expected to play an essential role in the next generation of wireless networks. Multi-antenna FD operation, however, is challenged not only by the self-interference (SI) from each TX to its own RX but also cross-talk SI (CT-SI) between antennas. In this dissertation, first, a full-duplex phased array circulator-RX (circ.-RX) is proposed that achieves self-interference cancellation (SIC) through repurposing beamforming degrees of freedom (DoF) on TX and RX. Then, an FD MIMO circ.-RX is proposed that achieves SI and CT-SI cancellation (CT-SIC) through passive RF and shared-delay baseband (BB) canceller that addresses challenges associated with FD MIMO operation.

Wireless radios at millimeter-wave (mm-wave) frequencies enable the high-speed link for portable devices due to the wide-band spectrum available. Large-scale arrays are required to compensate for high path loss to form an mm-wave link. Mm-wave MIMO systems with digitization enable virtual arrays for radar, digital beamforming (DBF) for high mobility scenarios and spatial multiplexing. To preserve MIMO information, the received signal from each element in MIMO RX should be transported to ADC/DSP IC for DBF, and vice versa on the TX side. A large-scale array can be formed by tiling multiple mm-wave IC front-ends, and thus, a single-wire interface is desired between DSP IC and mm-wave ICs to reduce board routing complexity. Per-element digitization poses the challenge of handling high data-rate I/O in large-scale tiled MIMO mm-wave arrays. SERializer – DESerializer (SERDES) is traditionally being used as a high-speed link in computing systems and networks. However, SERDES results in a large area and power consumption. In this dissertation, a 60 GHz 4element MIMO TX with a single-wire interface is presented that de-multiplexes the baseband signal of all elements and LO reference that are frequency-domain multiplexed on a singlewire coax cable.

# **Table of Contents**

| Li       | st of             | Figure | es                                                          | $\mathbf{v}$ |
|----------|-------------------|--------|-------------------------------------------------------------|--------------|
| Li       | List of Tables xi |        |                                                             |              |
| A        | cknov             | wledge | ments                                                       | xiii         |
| D        | edica             | tion   |                                                             | xiii         |
| 1        | Intr              | oducti | on                                                          | 1            |
|          | 1.1               | Full-D | uplex Multi-Antenna Wireless                                | 3            |
|          |                   | 1.1.1  | Full-Duplex Wireless                                        | 3            |
|          |                   | 1.1.2  | Why Multi-Antenna Full-Duplex?                              | 4            |
|          |                   | 1.1.3  | Challenges                                                  | 4            |
|          |                   | 1.1.4  | Prior Works                                                 | 7            |
|          | 1.2               | A 60 C | GHz 4-Element MIMO TX With A Single-Wire Interface          | 7            |
|          |                   | 1.2.1  | Motivation                                                  | 7            |
|          |                   | 1.2.2  | Prior Works                                                 | 9            |
|          |                   | 1.2.3  | Single-Wire Interface Using Frequency-Division Multiplexing | 10           |
|          | 1.3               | Organ  | ization                                                     | 11           |
| <b>2</b> | Full              | -Duple | ex Phased Array Wireless                                    | 13           |
|          | 2.1               | FD Pł  | nased Array System Requirements                             | 14           |
|          |                   | 2.1.1  | Array SIC                                                   | 14           |

|   |      | 2.1.2  | Link Budget Calculations and FD operation                 | 15 |
|---|------|--------|-----------------------------------------------------------|----|
|   | 2.2  | SIC vi | a Beamforming                                             | 17 |
|   | 2.3  | Circui | t Implementation                                          | 22 |
|   |      | 2.3.1  | Integrated CircRX Phased Array RX                         | 22 |
|   |      | 2.3.2  | Phased Array TX                                           | 29 |
|   |      | 2.3.3  | Slot Loop Antenna                                         | 29 |
|   | 2.4  | Exper  | imental Results                                           | 29 |
|   |      | 2.4.1  | Single-Element CircRX Measurements                        | 30 |
|   |      | 2.4.2  | 8-Element FD Phased-array TRX Measurements                | 32 |
|   |      | 2.4.3  | FD Demonstration                                          | 37 |
|   | 2.5  | Summ   | ary                                                       | 39 |
| 3 | Full | -Duple | ex MIMO Wireless                                          | 41 |
|   | 3.1  | FD M   | IMO Cancellation: Complexity, Trade-offs and Architecture | 42 |
|   |      | 3.1.1  | Dual Injection MIMO Canceller                             | 43 |
|   |      | 3.1.2  | Shared-Delay Baseband Canceller Architecture              | 43 |
|   |      | 3.1.3  | Baseband Canceller Noise Penalty                          | 46 |
|   | 3.2  | Circui | t Implementation                                          | 48 |
|   |      | 3.2.1  | Circulator-Receiver as a Shared MIMO FD Antenna Interface | 50 |
|   |      | 3.2.2  | MIMO Passive RF Cancellers                                | 51 |
|   |      | 3.2.3  | MIMO Baseband Canceller with Shared Delay-Cells           | 54 |
|   |      | 3.2.4  | Output Buffer                                             | 59 |
|   |      | 3.2.5  | Clock Generation and Bootstrapping circuitry              | 59 |
|   | 3.3  | Exper  | imental Results                                           | 62 |
|   |      | 3.3.1  | Circulator-Receiver Measurements                          | 62 |
|   |      | 3.3.2  | Linearity Measurements                                    | 64 |
|   |      | 3.3.3  | RF MIMO Passive Canceller Measurements                    | 64 |
|   |      | 3.3.4  | Wireless Full-Duplex MIMO SIC and CT-SIC Measurements     | 66 |
|   | 3.4  | Summ   | ary                                                       | 70 |

| 4  | A 6   | 0 GHz  | 4-Element MIMO TX with a Single-Wire Interface                      | 72  |
|----|-------|--------|---------------------------------------------------------------------|-----|
|    | 4.1   | Single | Wire Interface                                                      | 73  |
|    |       | 4.1.1  | Challenge                                                           | 73  |
|    |       | 4.1.2  | Signal Flow                                                         | 75  |
|    |       | 4.1.3  | Harmonic Rejection Mixer                                            | 76  |
|    | 4.2   | Circui | t Implementation                                                    | 77  |
|    |       | 4.2.1  | Duplexer                                                            | 78  |
|    |       | 4.2.2  | Two-Stage Harmonic Rejection Mixer                                  | 80  |
|    |       | 4.2.3  | Low-Pass Filter                                                     | 80  |
|    |       | 4.2.4  | Upcoversion Gilbert-Cell Mixer                                      | 82  |
|    |       | 4.2.5  | Pre-Driver and Power Amplifier                                      | 83  |
|    | 4.3   | Measu  | rement Results                                                      | 84  |
|    | 4.4   | Summ   | ary                                                                 | 87  |
| 5  | Con   | clusio | n                                                                   | 89  |
| Ι  | Bi    | bliogr | aphy                                                                | 94  |
| Bi | bliog | graphy |                                                                     | 95  |
| II | А     | ppen   | dix 1                                                               | 108 |
| A  | A C   | ircuit | Simulation Technique for Inter-modulation Simulations and Lin-      | -   |
|    | eari  | ty Ana | alysis of N-path Filters and Passive-Mixer-Like Circuits            | 109 |
|    | A.1   | Circui | t simulation technique using SP equations and short channel effects | 111 |
|    |       | A.1.1  | Long-Channel Transistor Surface Potential Equations                 | 112 |
|    |       | A.1.2  | Effective Mobility                                                  | 114 |
|    |       | A.1.3  | Charge Sharing                                                      | 115 |
|    |       | A.1.4  | Velocity Saturation                                                 | 116 |

|     | A.1.5 Channel Length Modulation                                             | 117 |
|-----|-----------------------------------------------------------------------------|-----|
|     | A.1.6 Parameter Extraction                                                  | 118 |
| A.2 | Mixer-First Receiver/N-Path-Filter Linearity Analysis and Design Trade-Offs | 119 |
| A.3 | Simulation and Measurement Results                                          | 126 |
| A.4 | Summary                                                                     | 130 |

# List of Figures

| 1.1 | Evolving speed of wireless networks [1]. $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$               | 1  |
|-----|-----------------------------------------------------------------------------------------------------|----|
| 1.2 | (a) Beamforming antenna array scenario at BS and UE [2] and (b) the antenna                         |    |
|     | array configuration for 5G cellular mobile [3].                                                     | 2  |
| 1.3 | Different multiplexing access methods: (a) time-division (b) frequency-division                     |    |
|     | (c) full-duplex [4]                                                                                 | 3  |
| 1.4 | Integrating FD operation with MIMO technology can result in higher data                             |    |
|     | rate and spectrum efficiency for the next generation of wireless communication.                     | 5  |
| 1.5 | SI and CT-SI in full duplex (a) single-antenna (b) and multi-antenna wireless.                      | 5  |
| 1.6 | In an $N$ -element FD MIMO radio, a cancellation path for each TX-RX pair                           |    |
|     | is required, and so the canceller complexity grows as $\mathcal{O}(N^2)$                            | 6  |
| 1.7 | Prior system level FD MIMO transceivers using discrete components: (a) FD                           |    |
|     | $3{\times}3$ MIMO with shared delay structure to reduce the canceller complexity                    |    |
|     | (b) FD MIMO using DBF only on the TX side, and (c) FD 2-element MIMO                                |    |
|     | radio for LTE applications using 4 RF bulky multi-tap cancellers to suppress                        |    |
|     | the SI and CT-SI                                                                                    | 8  |
| 1.8 | Prior transceivers with single-wire interface: (a)<br>A $60\mathrm{GHz}~\mathrm{TX/RX}$ with single |    |
|     | coaxial interface for low-cost integration in PC platform $[5]$ (b)A 4-element                      |    |
|     | 28 GHz mm-wave MIMO array with single-wire interface using code-domain                              |    |
|     | multiplexing $[6]$                                                                                  | 10 |
| 1.9 | Single-Wire interface between DSP unit and mm-wave unit using frequency-                            |    |
|     | division multiplexing                                                                               | 11 |
|     |                                                                                                     |    |

| 2.1  | FD N-path-filter-based circulator-receiver conceptual architecture and block         |    |
|------|--------------------------------------------------------------------------------------|----|
|      | diagram                                                                              | 14 |
| 2.2  | FD phased array link budget.                                                         | 15 |
| 2.3  | (a) A 2×4 8-element antenna array at 730 MHz with $\lambda/2$ spacing, (b) the       |    |
|      | measured SI channel magnitudes at 730 MHz, and (c)–(d) examples of mea-              |    |
|      | sured SI channel magnitudes across frequency from antenna elements $(1,1)$           |    |
|      | and $(1,2)$ to the adjacent elements, respectively                                   | 18 |
| 2.4  | (a)–(b) Simulated TX/RX array patterns in the $x$ - $z$ and $y$ - $z$ planes while   |    |
|      | achieving 60 dB array SIC across 20 MHz with 3 dB array gain degradation             |    |
|      | in the TX/RX broadside beam-pointing directions, (c) simulated array SIC             |    |
|      | where at least 60 dB SIC is guaranteed to be achieved across 720–740 MHz,            |    |
|      | (d) simulated TX/RX array gain for a desired array SIC based on solving an           |    |
|      | optimization problem using the measured SI channels depicted in Fig. 2.3. $$ .       | 19 |
| 2.5  | (a)–(b) spatial distribution of the simulated sum TX and RX beamforming              |    |
|      | gain loss for achieving the desired RF SIC across 20 MHz and 40 MHz band- $$         |    |
|      | width                                                                                | 20 |
| 2.6  | Combined non-magnetic non-reciprocal circulator-receiver deployed as inte-           |    |
|      | grated shared antenna interface                                                      | 23 |
| 2.7  | Block/Circuit Diagram of scalable 4-element FD circRX phased array                   | 24 |
| 2.8  | Tiling two circRX phased array to form 8-element FD circRX phased array.             | 24 |
| 2.9  | Beamforming using 3-bit vector modulator: (a) 4-phase (b) 8-phase                    | 25 |
| 2.10 | Inverter-based output buffer with common-mode feedback                               | 26 |
| 2.11 | CircRX phased array clock distribution and synchronize and reset circuitry.          | 27 |
| 2.12 | Circuit details of programmable $\mathrm{g}_m\text{-cells}$ deployed for beamforming | 28 |
| 2.13 | A custom-designed 8-element transmitter phased array: (a) Block diagram,             |    |
|      | and (b) PCB implementation                                                           | 30 |
| 2.14 | Slot loop antenna simulated radiation pattern                                        | 31 |

| 2.15 | FD N-path-filter-based circulator-receiver conceptual architecture and block   |    |
|------|--------------------------------------------------------------------------------|----|
|      | diagram                                                                        | 31 |
| 2.16 | Custom designed antenna tuner: (a) circuit diagram (b) PCB                     | 32 |
| 2.17 | Single-element circRX measurements: (a) TX-ANT S-parameters demon-             |    |
|      | strating non-reciprocity, (b) TX-ANT IIP3.                                     | 33 |
| 2.18 | Single-element circRX measurements: (a) TX-ANT S-parameters demon-             |    |
|      | strating nonreciprocity, (b) TX-ANT IIP3 and (c) NF                            | 34 |
| 2.19 | Measured full-duplex phased-array performance across 8-elements (tiling of $2$ |    |
|      | ICs): (a) array SIC, (b) impact of optimized weights to achieve SIC on the     |    |
|      | TX/RX array gain.                                                              | 35 |
| 2.20 | Gain compression of a small received signal under the influence of TX power    |    |
|      | with optimized weights with and without the antenna tuner                      | 35 |
| 2.21 | Two-tone TX test tracking the TX total SI and its IM3 products at the receiver |    |
|      | output with additional digital SIC.                                            | 36 |
| 2.22 | Wireless FD demonstration setup                                                | 37 |
| 2.23 | Demo results: A -31dBm desired signal radiated from 20ft away from a sin-      |    |
|      | gle antenna is recovered while transmitting a 5MHz OFDM-like signal with       |    |
|      | +8.7dBm TX array power                                                         | 38 |
| 3.1  | FD MIMO links benefits from MIMO diversity/capacity gain as well as FD         |    |
|      | spectrum efficiency. However, CT-SI must be addressed in addition to SI        | 42 |
| 3.2  | Proposed dual injection MIMO RF and BB canceller to address power han-         |    |
|      | dling and complexity challenges of FD MIMO implementation                      | 44 |
| 3.3  | MIMO baseband canceller structure and power consumption distribution: (a)      |    |
|      | typical MIMO canceller (b) proposed shared-delay BB canceller architecture.    | 45 |
| 3.4  | Model for the analysis of noise degradation due to active delay-based baseband |    |
|      | cancellation.                                                                  | 47 |
| 3.5  | Impact of the baseband active delay cells on NF as a function of RF SIC and    |    |
|      | CT-SIC                                                                         | 48 |

| 3.6  | Block and circuit diagram of the 65nm CMOS 2.2GHz full-duplex 2-element                        |    |
|------|------------------------------------------------------------------------------------------------|----|
|      | MIMO circRX with high TX power handling exploiting MIMO RF and                                 |    |
|      | shared-delay baseband self-interference cancellation                                           | 49 |
| 3.7  | Two thick-oxide devices stacked to implement the feed capacitor and $\mathrm{TX}/\mathrm{ANT}$ |    |
|      | capacitor banks.                                                                               | 51 |
| 3.8  | Circulator model for feed-capacitor-based RF SIC and CT-SIC analysis                           | 52 |
| 3.9  | Calculated antenna (a) VSWR and (b) cross-talk coupling coverage, which                        |    |
|      | shows that a VSWR of 1.5 (red circle) or up to -14.8 dB of antenna coupling                    |    |
|      | (red circle) can be covered                                                                    | 55 |
| 3.10 | Passive mixer along with programmable capacitor bank used to tap from the                      |    |
|      | TX for BB SIC and CT-SIC                                                                       | 56 |
| 3.11 | Complimentary delay cell architecture based on [7]                                             | 56 |
| 3.12 | Measured delays across the 4 taps of the delay element                                         | 57 |
| 3.13 | BB canceller vector modulator.                                                                 | 58 |
| 3.14 | Baseband canceller IIP3 simulations: (a) TX to the output of each stage of                     |    |
|      | the delay line, and (b) TX to output current of the baseband canceller when                    |    |
|      | the cancellation weight is programmed                                                          | 59 |
| 3.15 | Inverter-based output buffer with common-mode feedback                                         | 60 |
| 3.16 | Clock bootstrapping in N-path filters can improve the linearity by keeping                     |    |
|      | the gate-source voltage constant. N-path filter and FET terminal voltages (a)                  |    |
|      | without and (b) with clock bootstrapping.                                                      | 60 |
| 3.17 | Clock generation and bootstrapping circuitry.                                                  | 61 |
| 3.18 | Simulation results of the circRX performance across 10% variation in wire-                     |    |
|      | bond inductance, where L1, L2 and L3 are the wirebond inductances in the                       |    |
|      | CLC sections between the ANT-RX, TX-RX and TX-ANT ports, respectively:                         |    |
|      | (a) TX-ANT response, and (b) TX-RX isolation.                                                  | 63 |
| 3.19 | Chip microphotograph of the 65 nm CMOS FD 2-element MIMO circRX in                             |    |
|      | an 80-pin QFN package.                                                                         | 64 |

| 3.20 | Single-element circRX measurements: (a) TX-ANT response demonstrating        |    |
|------|------------------------------------------------------------------------------|----|
|      | nonreciprocity, (b) ANT-BB conversion gain, and (c) NF                       | 65 |
| 3.21 | Single-element circRX ANT-BB linearity measurements: (a) in-band 1dB         |    |
|      | compression point, (b) IIP3 versus offset frequency                          | 66 |
| 3.22 | Linearity measurements with and without bootstrapping: (a) TX-ANT IIP3       |    |
|      | and (b) TX-induced RX gain compression.                                      | 66 |
| 3.23 | Measured RF SIC/CT-SIC performance: (a) SIC antenna VSWR coverage            |    |
|      | and (b) CT-SIC antenna coupling coverage.                                    | 67 |
| 3.24 | FD MIMO wireless SIC and CT-SIC measurement setup                            | 68 |
| 3.25 | Wireless FD measurements: (a) single-element FD SIC measurement results,     |    |
|      | and (b) two-element FD CT-SIC measurement results.                           | 69 |
| 3.26 | (a) NF degradation due to SIC and CT-SIC, and (b) NF degradation in the      |    |
|      | presence of TX signal.                                                       | 70 |
| 4.1  | A scalable MIMO TX architecture with a single-wire interface based on CDMA   |    |
|      | and FDMA.                                                                    | 74 |
| 4.2  | Single-wire interface signal flow breakdown.                                 | 75 |
| 4.3  | HRM using $1/N$ duty cycle clocks and baseband gain coefficients [8]         | 77 |
| 4.4  | HR is limited due to gain quantization and mismatch. Multi-stage HRM         |    |
|      | reduces the degradation in HR due to gain and phase mismatch                 | 77 |
| 4.5  | Block diagram of a 45 nm RFSOI 60 GHz 4-element MIMO TX with a               |    |
|      | frequency-division-multiplexed 10 GHz single-wire interface.                 | 78 |
| 4.6  | Circuit Implementation of the duplexer that includes high pass and low pass  |    |
|      | filters to divide 0-10 GHz IF data and 30 GHz LO signal from the single wire |    |
|      | input                                                                        | 79 |
| 4.7  | S-parameter simulation of the duplexer which divides IF signals and 30 GHz   |    |
|      | reference with around 40 dB isolation.                                       | 79 |
| 4.8  | Circuit Implementation of the 2-stage harmonic rejection mixer with minimum  |    |
|      | area and power consumption.                                                  | 80 |

| 4.9  | Post-layout inter-channel HR simulation shows HR better than $>40$ dB be-                          |     |
|------|----------------------------------------------------------------------------------------------------|-----|
|      | tween all channels.                                                                                | 81  |
| 4.10 | The 5th order differential elliptic low pass filter using passive components.                      | 82  |
| 4.11 | Doubly-balanced Gilbert cell mixer used as upconversion mixer with BB am-                          |     |
|      | plifier (a) circuit diagram and (b) simulation results.                                            | 82  |
| 4.12 | Two-stage stacked class-E PA (a) circuit diagram, and (b) simulation results.                      | 83  |
| 4.13 | Chip microphotograph of the 45 nm CMOS RF-SOI 60 GHz 4-element MIMO $$                             |     |
|      | ΤΧ                                                                                                 | 84  |
| 4.14 | Measurement gain results of each channel and channel-to-channel isolation for                      |     |
|      | three different chips (numbers show the worst case among the sample chips).                        | 85  |
| 4.15 | Measured gain, output power and drain efficiency as a function of input power                      |     |
|      | for all channels for three different chips. The inset in each figure shows the                     |     |
|      | Psat, OP1dB and drain efficiency at OP1dB for the 3rd sample                                       | 86  |
| 4.16 | Measurement setup to demonstrate simultaneous formation of multiple beams                          |     |
|      | carrying independent signals: antenna pattern measured for two simultaneously-                     |     |
|      | transmitted frequencies, 59.9GHz and 60.05GHz                                                      | 87  |
| A.1  | Proposed circuit simulation technique based on surface potential equations                         | 110 |
| A.2  | Simulation of total capacitance at the transistor terminals of $24 \times \frac{4\mu m}{60nm}$ de- |     |
|      | vice in 65nm CMOS using foundry BSIM4 models and the proposed circuit                              |     |
|      | simulation technique.                                                                              | 112 |
| A.3  | Cross section of the MOS transistor.                                                               | 114 |
| A.4  | (a) Single-ended 4-path mixer-first receiver (MFRx), and (b) MFRx/N-path                           |     |
|      | filter equivalent circuit for the analysis of OOB linearity                                        | 119 |
| A.5  | Simplifed model to analyze the MFRx noise performance [9]                                          | 121 |
| A.6  | Trade-off of between OOB-IIP3 and power consumption at 1GHz ( $R_s = 50\Omega$ ).                  | 122 |
| A.7  | Trade-off of between OOB-IIP3 and NF vs. $R_t$ for different values of the                         |     |
|      | amplifier gain $(R_{SW} = 3\Omega, R_s = 50\Omega)$ .                                              | 123 |
|      |                                                                                                    |     |

| A.8  | Simulated $I_{DS}$ vs. $V_{DS}$ of a $24 \times \frac{4\mu m}{60nm}$ device for different values of the $V_{GS}$ |     |
|------|------------------------------------------------------------------------------------------------------------------|-----|
|      | using factory-provided BSIM4 models and our circuit simulation technique. $\ .$                                  | 124 |
| A.9  | Gummel test on a $24 \times \frac{4\mu m}{60nm}$ device using BSIM4 models, the proposed circuit                 |     |
|      | simulation technique, 3rd order polynominal curve-fit and measured data                                          | 125 |
| A.10 | Simulation and measurement results from a harmonic test on a single $24 \times \frac{4\mu m}{60nm}$              |     |
|      | transistor switch: (a) test setup, and (b) results                                                               | 127 |
| A.11 | IIP3 simulations of a single $24 \times \frac{4\mu m}{60nm}$ switch to ground - BSIM4, the modeling              |     |
|      | approach in [10], and our proposed circuit technique. $\ldots$ $\ldots$ $\ldots$ $\ldots$                        | 128 |
| A.12 | Block and circuit diagram and chip micrograph of the 65nm CMOS $0.15\text{-}2\mathrm{GHz}$                       |     |
|      | mixer-first receiver.                                                                                            | 129 |
| A.13 | Comparison of measured MFRx out-of-band (OOB) IIP3 with the simulated                                            |     |
|      | OOB IIP3 using the proposed circuit simulation technique                                                         | 130 |
| A.14 | Comparison of transmitter-to-antenna $\mathrm{IIP}_3$ measurements of our 750MHz non-                            |     |
|      | magnetic non-reciprocal integrated N-path-filter-based circulators fabricated                                    |     |
|      | in 65nm CMOS with new simulations based on our circuit simulation tech-                                          |     |
|      | nique: (a) [11],and (b) [12]                                                                                     | 131 |
|      |                                                                                                                  |     |

# List of Tables

| 2.1 | FD phased array link budget.                                                                                        | 16 |
|-----|---------------------------------------------------------------------------------------------------------------------|----|
| 2.2 | Comparison of the proposed FD phased array circRX with the state-of-the-                                            |    |
|     | art FD RXs with an integrated shared antenna interface                                                              | 39 |
| 3.1 | Comparison of the proposed FD MIMO circRX with state-of-the-art FD RXs with an integrated shared antenna interface. | 71 |
| 4.1 | Comparison with state-of-the-art mm-wave transceivers with and without                                              |    |
|     | single-wire interface.                                                                                              | 88 |

To my parents, grandparents and dear friends...

## Chapter 1

## Introduction

By 2020, according to CISCO, more people (5.4 B) will have mobile phones than have electricity (5.3 B), running water (3.5 B) and cars (2.8 B). While 75% of the mobile data is bandwidth-hungry video, users expect a higher data rate and more reliable wireless communication [1]. The use of a large number of antennas in base station (BS), access point and user equipment (UE), called phased array or MIMO is a key technology promised to increase the capacity of network and beyond [1,3,13–16]. Wireless network speed has improved over the years starting from single-input single-output (SISO) systems, single-userand multi-user-MIMO networks. MU-MIMO systems already provide a significant advantage over earlier systems, and massive MIMO (mMIMO) aims to further enhance data throughput to >10 Gbps (Fig.1.1).



Figure 1.1: Evolving speed of wireless networks [1].

Mm-wave communication is considered in fifth-generation (5G) of the wireless system as it promises tremendous available bandwidth for high data rate communication. However, it is important to understand the channel dynamics with respect to time and space to form a robust communication system as mm-wave signals are highly susceptible to blocking, and they have communication limits. High-gain directional antennas can be used at both the transmitting and receiving ends, resulting in a significantly enhanced signal-to-noise ratio (SNR), and improved data security for long-range mm-wave point-to-point (P2P) communications with a line of sight link [2]. However, directional antennas with narrow beam are not viable for multi-user communication as they only provide very limited spatial coverage. Phased array transceivers are promising a robust reliable wireless link at mm-wave. However, user mobility and environmental variations are essential factors that need to be considered. Although many works deploy analog beamforming, DBF is desired for user discovery and tracking which poses significant input/output (I/O) challenge.

Multi-antenna radios are emerging in BS, AP as well as space-constraint femtocells and UE (Fig.1.2), and thus designing a compact power-efficient multi-antenna radio is a high-impact interesting research topic. This dissertation proposes high-performance multiantenna wireless by adding FD feature to phased array/MIMO technologies and introducing an mm-wave TX array with a single-wire interface that addresses I/O challenge for DBF.



Figure 1.2: (a) Beamforming antenna array scenario at BS and UE [2] and (b) the antenna array configuration for 5G cellular mobile [3].

### 1.1 Full-Duplex Multi-Antenna Wireless

#### 1.1.1 Full-Duplex Wireless

Current Wireless systems rely on duplexing to avoid SI. Bluetooth and WiFi are examples of transceivers that use time-division duplexing (TDD) by sending and receiving at nonoverlapping time slots (Fig.1.3(a)). Majority of today cellular bands use frequency-division duplexing (FDD) to separate the transmission and reception in the frequency domain [17] (Fig.1.3(b)). Full-duplex wireless, namely simultaneous transmission and reception at the same frequency, is an emerging technology that has gained attention due to its potential to double data throughput [18, 19], as well as provide other benefits in the higher layers such as better spectral efficiency, reducing network and feedback signaling delays, and resolving hidden-node problems to avoid collisions [19-22]. However, several challenges remain in the quest for high-performance integrated FD radios. TX power handling remains an open problem, particularly in FD radios that integrate a shared antenna interface. Recent integrated shared antenna interfaces that exhibit high power handling exploit SOI CMOS technologies and occupy substantial area [23]. Secondly, FD operation must be achieved across antenna VSWR variations and a changing EM environment. Finally, FD must be extended to multiantenna radios, including phased array and MIMO radios, as over the next decade, they are expected to play an essential role in the next generation of wireless networks.



Figure 1.3: Different multiplexing access methods: (a) time-division (b) frequency-division (c) full-duplex [4].

The main challenge that single-input-single-output (SISO) FD radios face is the tremendous amount of SI from the TX to its own RX. The SI can be mitigated through the magnetic-free integrated circulators at RF [11, 12, 23–25] and mm-wave frequencies [26, 27] as the shared antenna interface, reciprocal electrical-balance duplexers [28–31], polarizationbased antenna interfaces [32, 33], active antenna duplexers [34], and analog RF/baseband cancellers [35–41]. Finally, to form an FD link, digital cancellers are also required in addition to analog ones to meet stringent SIC requirements [20, 35, 42, 43].

#### 1.1.2 Why Multi-Antenna Full-Duplex?

Integrating FD operation with multi-antenna technologies is an important research challenge. Fig. 1.4 depicts the concept of FD multi-antenna radios where FD wireless is a key to spectrum efficient wireless communication and Phased Array/MIMO wireless is vital to form a high data rate link using beamforming and diversity/capacity gain. Combination of these two technologies can result in higher data rate and spectrum efficiency, while substantially enhancing the link range.

#### 1.1.3 Challenges

#### 1.1.3.1 Power Handling

Integrating FD operation is challenged not only by the SI from each TX to its own RX, but also CT-SI between antennas (Fig. 1.5). In an N-element FD phased array transceiver, in the worst case in each RX channel, the SI and CTSI can add up constructively to increase by an amount equal to the TX array gain  $(N^2)$ , and then add up constructively after RX beamforming to increase by an amount equal to the RX array gain  $(N^2)$ , resulting in a total increase of  $N^4$  relative to a single-element transceiver. Similarly, in an N-element FD MIMO transceiver, assuming orthogonal coding, the total SI and CT-SI power can be N times larger than the single-element case (assuming similar power level for the SI and CT-SI across all elements). This can substantially limit the FD phased array/MIMO transceiver power



Figure 1.4: Integrating FD operation with MIMO technology can result in higher data rate and spectrum efficiency for the next generation of wireless communication.

handling compared to the single-element case and alleviate benefits gained from deploying a multi-antenna system.



Figure 1.5: SI and CT-SI in full duplex (a) single-antenna (b) and multi-antenna wireless.

#### 1.1.3.2 Canceller Complexity

In an FD MIMO transceiver, CT-SI between each TX-RX pair needs to be suppressed. As depicted in Fig.1.6, a cancellation path,  $H_{ij}$ , is required from each TX to all other RXs. Therefore, an N-element FD MIMO transceiver requires  $N^2$  cancellation paths, which means the complexity of canceller in terms of associated area, power dissipation and noise penalties grow as  $O(N^2)$ . This is both area- and power-hungry, and not feasible for radios with a large number of elements. Furthermore, the noise from SI and CT-SI cancellation paths can accumulate and further degrades the radio sensitivity level.



Figure 1.6: In an N-element FD MIMO radio, a cancellation path for each TX-RX pair is required, and so the canceller complexity grows as  $O(N^2)$ .

#### 1.1.4 Prior Works

To our best of knowledge, there is no prior integrated FD phased array/MIMO transceiver. However, there have been a few works on the system level using off-the-shelf discrete components shown in Fig. 1.7. In [44], correlation between SI and CT-SI is exploited to share cancellation delay taps between SI and CT-SI cancellation paths (Fig. 1.7(a)). This is based on a simple idea that cross-talk signal experiences slightly higher delay compared to the SI. This enables FD MIMO canceller complexity to increase linearly with the number of elements. However, this radio uses bulky delay lines and there is no phase control on the delay taps. SoftNull [45] is another technique that divides number of available antennas between TX and RX and sacrifices MIMO degree of freedom (DoF) on TX side DBF to achieve SIC (1.7b). However, this work does not employ a shared antenna interface for TX/RX which relaxes the SI and CT-SI levels that must be cancelled and does not feature analog TX or RX cancellers, with SI and CT-SI cancellation only achieved through DBF. In [46], an FD 2-element MIMO radio is presented for LTE applications using 4 multi-tap cancellers to suppress the SI and CT-SI in the RF domain 1.7c). However, this radio does not address the  $N^2$  challenge, requires bulky delay cells, and also does not use a shared antenna interface between TX and RX.

## 1.2 A 60 GHz 4-Element MIMO TX With A Single-Wire Interface

#### 1.2.1 Motivation

Wireless radios at mm-wave frequencies enable the high-speed link for portable devices due to the wide-band spectrum available. However, the high path loss is the main challenge that hinders forming an mm-wave link. Large-scale mm-wave phased arrays with hundreds of elements have been demonstrated with analog beamforming [47, 48] resulting in one IF signal, but MIMO array enables



Figure 1.7: Prior system level FD MIMO transceivers using discrete components: (a) FD  $3\times3$  MIMO with shared delay structure to reduce the canceller complexity (b) FD MIMO using DBF only on the TX side, and (c) FD 2-element MIMO radio for LTE applications using 4 RF bulky multi-tap cancellers to suppress the SI and CT-SI.

- Multiple user or data stream using spatial multiplexing [49];
- Enhanced radar resolution through the virtual array concept [50];
- DBF for user discovery and tracking in highly mobile scenarios.

However, DBF needs per-element digitization which results in a significant I/O challenge at mm-wave with large signal bandwidth.

Large-scale mm-wave arrays can be formed by tiling integrated phased array ICs with single or multi-beam I/O. Since practical implementation challenges lead to separate ICs as mm-wave RF radio part and ADC/DSP, a single coaxial interface for scalable phased arrays/MIMO mm-wave unit is desired. To preserve MIMO information, the received signal from each element in a MIMO RX/TX should be transported to ADC/DSP IC for DBF and vice versa, and thus, signals from each element should co-exist on a single-wire interface through serializing the data or multiplexing in time, frequency or code-domain.

SERDES are traditionally being used in computing systems and networks to serialize multiple streams of data on a single connection. However, these circuits along with required clock data recovery are power- and area-hungry. In addition to data, the clock signal is also required to be routed to mm-wave RF front-end ICs, as the mixers in all the elements in a MIMO radio need to be synchronized.

### 1.2.2 Prior Works

In [5], a 16-element TX/RX 60 GHz transceiver that achieves high throughput and enables simple integration into laptops and other consumer electronic devices. The transceiver is a dual-chip split-IF architecture and uses a single coaxial cable interface (Fig. 1.8). A fixed-IF architecture is employed, and the 8.64 GHz IF signal is passed over the coaxial cable along with a 270 MHz reference signal for a front-end phase-locked loop (PLL), a 2.64 GHz control signal and DC power. Although this work simplifies the interface for phased arrays, due to analog beamforming prior to the digitization its impossible to have multi-beam and spatial information is lost.

A 28 GHz 4-element MIMO RX in 65 nm CMOS with a single-wire interface that multiplexes the BB signals of all elements and the LO reference through code-domain multiplexing is demonstrated in [6]. Walsh-function is used which spread each element received IF signal with low cross-code leakage. The approach is validated through DBF after de-multiplexing of the BB signals from the single-wire. However, the high spreading ratio of 16 used in this work results in high occupation of the spectrum and requires power-consuming digitization, and thus this solution is not suitable for transceivers with multi-GHz of IF data bandwidth.



Figure 1.8: Prior transceivers with single-wire interface: (a)A 60GHz TX/RX with single coaxial interface for low-cost integration in PC platform [5] (b)A 4-element 28 GHz mm-wave MIMO array with single-wire interface using code-domain multiplexing [6].

### 1.2.3 Single-Wire Interface Using Frequency-Division Multiplexing

In this dissertation, a 4-element MIMO TX mm-wave IC with a single-wire interface is proposed which receives the IF signals that are frequency-division multiplexed on harmonics of each other shown in Fig. 1.9 along with 30 GHz reference. A harmonic rejection mixer is deployed to downconvert and separate each element IF signal from the single-wire interface to zero-IF frequencies and finally upconvert to 60 GHz and amplify on each element. The 60 GHz WiGig is a potential application in which each channel bandwidth is around 2 GHz which makes code-domain multiplexing similar to [6], not a viable solution as that results in 32 GHz spreaded data bandwidth which requires high-speed power-hungry ADC and digital processing.



Figure 1.9: Single-Wire interface between DSP unit and mm-wave unit using frequencydivision multiplexing

### **1.3** Organization

This thesis is organized as follows. To demonstrate high-performance multi-antenna radios, first, FD phased-array is discussed. Then, FD MIMO is presented and finally, an mm-wave MIMO TX with a single-wire interface is discussed and the dissertation is concluded.

Chapter.2 presents how phased array beamforming can be combined with FD operation to achieve wideband SI suppression with minimal link budget penalty in terms of TX and RX array gains. The detailed analysis of system-level requirements for FD phased-array wireless links and circuit implementation is presented. Furthermore, an optimization problem to jointly maximize the TX and RX beamforming gains subject to the constraint on the amount of achieved wideband RF SIC is formulated. A 65 nm CMOS scalable 4-element FD phased array circ.-RX which utilizes the multiple phases naturally available in the N-pathfilter circulators to perform beamforming and SI cancellation is implemented. An 8-element FD phased-array based on the described ICs achieves (i) 50 dB overall RF array SIC over 16.25 MHz, WiFi-like, bandwidth with less than 3.5/3 dB penalty in TX/RX array gains, and (ii) 100 dB overall array SIC including digital SIC, supporting +16.5 dBm TX array power handling.

Chapter.3 presents an integrated FD 2-element MIMO circ.-RX array exploiting MIMO RF and BB SIC which address the complexity associated with FD MIMO operation by sharing delay cells and enhance power handling using bootstrapping technique. In this chapter, first, challenges and proposed solutions associated with implementing FD MIMO radios are illuminated. Then, implementation details are described and finally measurement results are discussed. The 65 nm CMOS prototype exhibits (i) up to 35/45 dB average SIC across 40/20 MHz BW, (ii) more than 42/53 dB average CT-SIC across 40/20 MHz BW with <2.1 dB degradation in RX NF, and (iii) overall TX power handling of +14 dBm enabled by clock bootstrapping.

Chapter.4 presents a 60 GHz 4-element MIMO TX with a single-wire interface that receives the BB signals multiplexed on a single-wire and LO reference through frequencydivision multiplexing. A two-stage 16-phase harmonic rejection mixer is deployed to downconvert IF signals of each element to zero-IF frequencies. Then, a fifth-order low-pass filter is used to reject the higher harmonics and clean the spectrum mask. Finally, each IF signal is upconverted to 60 GHz using a Gilbert-cell based mixer and amplified using a two-stage stacked PA structure. The proposed structure is designed and taped-out in 45 nm RFSOI CMOS. Measurement results demonstrate (i) >20 dB gain in each channel with (ii) >30 dB inter-channel isolation, (iii) >8.8 dBm output -1 dB compression point and >9.1 dBm output saturation power.

Chapter.5, finally, concludes the dissertation with a summary of the key technical contribution and suggestion for future research directions.

## Chapter 2

## **Full-Duplex Phased Array Wireless**

The phased array can reject spatial interference and provide coherent beamforming gain which can substantially increase the range in silicon-based radios that are limited in TX power handling and RX noise perspective. Furthermore, FD wireless can potentially double the data throughput by simultaneously transmitting and receiving at the same frequency, and can provide benefits on higher layers such as better spectral efficiency, reducing network, feedback signaling delays, and resolving hidden-node problems to avoid the collision.

In this chapter, we present how phased array beamforming can be combined with FD operation to achieve wideband SI suppression with minimal link budget penalty in terms of TX and RX array gains [51]. FD phased array operation is extremely challenging as not only the SI from each TX to its own RX needs to be cancelled, but also CT-SI between each TX-RX pair need to suppressed, as shown in Fig. 2.1. Beamforming DoF are re-purposed to achieve SIC without any explicit cancellation circuitry. We present a detailed analysis of system-level requirements for the FD phased array wireless links and circuit implementation. We then formulate an optimization problem with an objective to jointly maximize the TX and RX beamforming gains subject to the constraint on the amount of achieved RF SIC across wideband. A 65 nm CMOS scalable 4-element FD phased array circ.-RX is proposed which utilizes the multiple phases naturally available in the N-path-filter circulators to perform beamforming and SI cancellation. Finally, the concept is validated through measurements



Figure 2.1: FD N-path-filter-based circulator-receiver conceptual architecture and block diagram.

performed on an 8-element FD phased array transceiver by tiling two of the described circ.-RX ICs and a TX beamformer implemented using off-the-shelf discrete components.

### 2.1 FD Phased Array System Requirements

### 2.1.1 Array SIC

Two N-element FD phased array circ.-RXs forming an FD link are depicted in Fig. 2.2. Although the SIC after beamforming depends heavily on the antenna response, in the worst case in each RX channel, the SI and CT-SI can add up constructively to increase by an amount equal to the TX array gain  $(N^2)$ , and then add up constructively after RX beamforming to increase by an amount equal to the RX array gain  $(N^2)$ , resulting in a total increase of  $N^4$  relative to a single-element transceiver. Hence, while phased array beam-



Figure 2.2: FD phased array link budget.

forming provides  $N^2$  increase in the array gain on both TX/RX sides, it can also increase the total SI at the RX output after beamforming by the same amount. Therefore, we define the array SIC similar to the single-element counterpart with the addition of TX/RX array gain:

$$SIC_{array,dB} = P_{TX} \cdot AG_{TX} \cdot AG_{RX}/SI,$$
 (2.1)

where  $P_{\text{TX}}$  is the TX power level at each element, SI is the residual SI power level and  $AG_{\text{TX/RX}}$  are TX/RX array gains.

In single-input-single-output (SISO) transceivers, the required SIC to suppress the SI level down to the noise level is equal to  $SIC_{SISO} = P_{TX}/N_{floor,SISO}$  where  $N_{floor,SISO} = kT \cdot BW \cdot NF$ . In an N-element phased array, the required SIC is given by  $SIC_{array} = P_{TX} \cdot AG_{TX} \cdot AG_{RX}/N_{floor,array}$ , where  $N_{floor,array} = kT \cdot BW \cdot NF \cdot N$ . Hence, the challenging required SIC in the array scenario is  $AG_{TX} \cdot AG_{RX}/N$  (as high as  $N^3$ ) times larger than the SISO case.

#### 2.1.2 Link Budget Calculations and FD operation

The desired received signal in an N-element FD phased array TRX link after the beamforming is:

$$P_{\rm RX,array} = \frac{P_{\rm TX} \cdot AG_{\rm TX} \cdot G_{\rm TX} \cdot AG_{\rm RX} \cdot G_{\rm RX}}{FSPL \cdot IL},$$
(2.2)

| Metric                                                                            | Calculation                                                          | Value    |
|-----------------------------------------------------------------------------------|----------------------------------------------------------------------|----------|
| Frequency (f)                                                                     |                                                                      | 730MHz   |
| # of ANT Elements (N)                                                             |                                                                      | 8        |
| TX power per Elements ( $P_{TX}$ )                                                |                                                                      | 1dBm     |
| TX/RX Array Gain (AG <sub>TX/RX</sub> )                                           | 20.Log <sub>10</sub> (N)-3dB                                         | 15dB     |
| TX/RX ANT Gain (G <sub>TX/RX</sub> )                                              |                                                                      | 6dBi     |
| Bandwidth (BW)                                                                    |                                                                      | 16.26MHz |
| RX Noise Figure (NF)                                                              |                                                                      | 5dB      |
| RX Array Noise Floor referred<br>to ANT Input ( <i>N</i> <sub>floor,array</sub> ) | kT.BW.NF.N                                                           | -88dBm   |
| Required SNR                                                                      |                                                                      | 20dB     |
| RX Array Sensitivity referred<br>to ANT Input (P <sub>sense</sub> )               | 2 . SNR . N <sub>floor,array</sub>                                   | -65dBm   |
| Implementation Losses (IL)                                                        |                                                                      | 10dB     |
| Supported Range (R <sub>max</sub> )                                               | $\frac{\lambda}{4\pi}\sqrt{P_{TX}AG_{TX}G_{TX}AGRXGRX.IL/P_{sense}}$ | 2.6km    |
| Required Array SIC (SIC <sub>array</sub> )                                        | $P_{TX}AG_{TX}AG_{RX}/N_{floor,array}$                               | 119dB    |

Table 2.1: FD phased array link budget.

where  $G_{\text{TX/RX}}$  are TX/RX antenna gains, FSPL is the free space propagation loss equal to  $(4\pi \cdot R \cdot f/c)^2$  and IL is a margin considered for practical implementation losses  $(R, f, \text{ and } c \text{ are the range, the operation frequency, and the light speed, respectively).$ 

Minimum required signal to noise ratio (SNR) determines the radio sensitivity level  $(P_{\text{sense}} = 2 \cdot SNR \cdot N_{\text{floor}, \text{array}})$  where the factor of 2 covers the SNR degradation because of the residual SI. The maximum range that the link can support  $(R_{\text{max}})$  can be calculated as:

$$R_{\rm max} = \frac{\lambda}{4\pi} \sqrt{P_{\rm TX} A G_{\rm TX} G_{\rm TX} \cdot A G_{\rm RX} G_{\rm RX} \cdot I L / P_{\rm sense}},$$
(2.3)

This clearly shows benefit of array gain in increasing link range. Table. 2.1 summarizes link budget calculation. Based on our calculations, for an 8-element 730 MHz array with +1 dBm TX power per element ( $P_{\text{TX}}$ ), 6 dBi antenna gain, 15 dB TX and RX array gains ( $AG_{\text{TX}}$  and  $AG_{\text{RX}}$ , 3 dB degraded from the ideal 18 dB array gain due to the need to achieve SIC), 16.25 MHz bandwidth (BW), 5 dB RX noise figure (NF), 20 dB required Signal-to-Noise Ratio (SNR) and 10 dB implementation losses, one can establish an FD link over a distance of 2.6 km. This shows that phased arrays can substantially enhance range in silicon-based FD transceivers which are limited in power handling, make them suitable for both space-constrained WiFi access points and small cell base stations.

### 2.2 SIC via Beamforming

An N-element phased array transceiver with a shared antenna interface, and amplitude and phase controls on each TX and RX element features overall 2(N-1) complex-valued DoF on TX and RX sides ((N-1) DoF on each side). These DoF are a representation of the complexvalued weights (amplitudes and phases) of each element relative to that of the first element. Typically, these DoF are employed to form the beams toward desired signal directions, and to minimize interference to/from nearby radios by pointing nulls towards them or suppressing the side-lobes of the radiation pattern. Alternatively, a few beamforming DoF at the TX and RX can be repurposed so that total SI is suppressed after RX beamforming at the expense of some TX and RX beam characteristics, such as a few nulls and/or some gain loss in the beam-pointing direction(s).

Fig. 2.3 depicts our implementation of a 2×4 rectangular array of slot loop antennas at 730 MHz with  $\lambda/2$  spacing, whose SI channel matrix in the frequency domain is denoted by  $\mathbf{H}^{\mathrm{SI}}(f) = [H^{\mathrm{SI}}_{(m',n'),(m,n)}(f)] \in \mathbb{C}^{8\times8}$ , where  $H^{\mathrm{SI}}_{(m',n'),(m,n)}(f)$  is the frequency response of the (CT-)SI channel from the  $(m, n)^{\mathrm{th}}$ -element to the  $(m', n')^{\mathrm{th}}$ -element. As Fig. 2.3 shows, the antenna matching is around -20 dB at the center frequency of 730 MHz (which would be the SI channel for each element if the circulators were ideal), while the magnitude of the SI channel from the closest element can be as high as -10 dB (e.g.,  $|H_{(2,1),(1,1)}|$ ). Moreover, a vertical pair of elements have higher SI channel magnitude than a horizontal pair of elements (e.g.,  $|H_{(2,1),(1,1)}| > |H_{(1,2),(1,1)}|$  at 730 MHz).

Let x(f) be the transmit signal in the frequency domain, and  $\mathbf{w}^{\text{TX}} = [w_{(m,n)}^{\text{TX}}]$  and  $\mathbf{w}^{\text{RX}} = [w_{(m,n)}^{\text{RX}}]$  be the complex-valued TX and RX beamforming weight vectors, respectively. Then, the phased array SI after TX and RX beamforming, denoted by  $x^{\text{SI}}(f)$ , is given by

$$x^{\mathrm{SI}}(f) = (\mathbf{w}^{\mathrm{RX}})^{\top} \mathbf{H}^{\mathrm{SI}} \mathbf{w}^{\mathrm{TX}} \cdot x(f), \qquad (2.4)$$

where  $(\cdot)^{\top}$  denotes the transpose of a vector.

Consider a 3D coordinate system where the  $2 \times 4$  rectangular array is located on the x-y plane. We denote the TX and RX beamforming directions by the azimuth and elevation an-



Figure 2.3: (a) A 2×4 8-element antenna array at 730 MHz with  $\lambda/2$  spacing, (b) the measured SI channel magnitudes at 730 MHz, and (c)–(d) examples of measured SI channel magnitudes across frequency from antenna elements (1,1) and (1,2) to the adjacent elements, respectively.

gles  $(\phi, \theta)$  in a horizontal coordinate system. Then, the *far-field* array TX/RX beamforming pattern is given by

$$E^{\mathrm{TX/RX}}(\phi,\theta) = (\mathbf{s}^{\mathrm{TX/RX}}(\phi,\theta))^{\top} \cdot \mathbf{w}^{\mathrm{TX/RX}}$$

$$= \sum_{m} \sum_{n} w_{(m,n)}^{\mathrm{TX/RX}} \cdot e^{j\pi[(m-1)\cos\phi\cos\theta + (n-1)\sin\phi\cos\theta)]},$$
(2.5)



Figure 2.4: (a)–(b) Simulated TX/RX array patterns in the x-z and y-z planes while achieving 60 dB array SIC across 20 MHz with 3 dB array gain degradation in the TX/RX broadside beam-pointing directions, (c) simulated array SIC where at least 60 dB SIC is guaranteed to be achieved across 720–740 MHz, (d) simulated TX/RX array gain for a desired array SIC based on solving an optimization problem using the measured SI channels depicted in Fig. 2.3.

where the TX/RX beam steering vector in the spatial direction of  $(\phi, \theta)$  is given by

$$\mathbf{s}^{\mathrm{TX/RX}}(\phi,\theta) = \left[s_{(m,n)}^{\mathrm{TX/RX}}(\phi,\theta)\right]$$
$$= \left[e^{j\pi\left[(m-1)\cos\phi\cos\theta + (n-1)\sin\phi\cos\theta\right]}\right].$$



Figure 2.5: (a)–(b) spatial distribution of the simulated sum TX and RX beamforming gain loss for achieving the desired RF SIC across 20 MHz and 40 MHz bandwidth.

The goal is to achieve wideband SIC in the *near-field* (2.4) with minimal penalty in the TX/RX beamforming gains in the *far-field* (2.5).

Denote by  $(\phi^{\text{TX/RX}}, \theta^{\text{TX/RX}})$  the main TX/RX beam-pointing direction in which the TX/RX beamforming gain needs to be maximized. We formulate an optimization problem where the objective is to maximize the TX and RX array gains, subject to the constraint that a desired amount of array SIC is achieved after TX and RX beamforming, i.e.,

$$\max_{\mathbf{w}^{\mathrm{TX}},\mathbf{w}^{\mathrm{RX}}} : AG_{\mathrm{TX/RX}},$$
(2.6)  
subject to :  $|(\mathbf{s}^{\mathrm{TX/RX}}(\phi^{\mathrm{TX/RX}}, \theta^{\mathrm{TX/RX}}))^{\top} \cdot \mathbf{w}^{\mathrm{TX/RX}}|^{2} \ge AG_{\mathrm{TX/RX}} |(\mathbf{w}^{\mathrm{RX}})^{\top} \mathbf{H}^{\mathrm{SI}}(f) \mathbf{w}^{\mathrm{TX}}|^{2} \le \chi, \forall f,$  $|w_{m,n}^{\mathrm{TX/RX}}|^{2} \le 1, \forall m, n.$ 

Specifically, the first constraint ensures minimal degradation in both the TX/RX array gains in the desired TX/RX beam-pointing direction (compared to the maximal TX/RX array gains without sacrificing any DoFs). The second constraint guarantees at least  $\chi_{dB} = 10 \log_{10}(\chi)$  dB SIC across the desired bandwidth so that the total RF SIC is ( $\chi_{dB} + AG_{TX} + AG_{TX}$ )

 $AG_{RX}$ ) dB. The last inequality sets the normalization constraint on each TX/RX beamforming weight.

We evaluated this idea through simulations using the measured array SI channel (see Fig. 2.3), where the optimization problem is solved using the MATLAB nonlinear optimization solver. We set  $\chi = 10^{-3}$  so that the total array SIC is given by  $(30 + AG_{TX} + AG_{RX})$  dB. The results are summarized in Fig. 2.4 and 2.5. In the simulation, the TX/RX array gains are maximized for broadside beamforming (i.e.,  $\theta = 90^{\circ}$ ) subject to the constraint that at least 60 dB array SIC is achieved between 720–740 MHz. Figs. 2.4(a) and 2.4(b) show that an array gain degradation of only 3 dB compared with the maximal array gain of  $N^2 = 18$  dB can be maintained while achieving 60 dB array SIC across 20 MHz. Moreover, Fig. 2.4(d) shows the trade-off between the maximum achievable TX/RX array gain and different amounts of desired array SIC across 20 MHz and 40 MHz, respectively.

To further investigate the performance of our proposed approach for different TX/RX beam-pointing directions, Figs. 2.5(a) and 2.5(b) show the distribution of the sum of TX and RX beamforming gain loss with varying TX/RX beamforming in the spatial direction of  $(\phi, \theta)$  (i.e., both the TX and RX beam-pointing directions are  $(\phi, \theta)$ ). The results show that the sum beamforming gain loss never exceeds 6.0 dB and 6.6 dB for achieving an array SIC of 60.0 dB and 59.4 dB over 20 MHz and 40 MHz bandwidth, respectively. Moreover, when beamforming in the directions with stronger SI (e.g., in the array broadside due to higher circulator TX-RX leakage or in the direction of adjacent antenna elements due to strong couplings), a higher of TX and RX beamforming gains need to be sacrificed to achieve the desired performance. Similar approaches can also be applied to explore the scenarios with different TX and RX beam-pointing directions and is a subject of our future work.

In general, four important features must be highlighted: (i) SI suppression is essentially achieved in the *spatial domain* through a trade-off between near-field SI nulling and farfield beamforming without any explicit cancellers and associated power consumption, since the RX/TX beamformers are repurposed, (ii) the SI suppression is *wideband* since different antenna coupling paths are cancelling each other, as opposed to having an IC canceller duplicate the frequency characteristics of an antenna coupling path, (iii) the beamforming-FD trade-off can be *dynamically adapted* in the field, with the number of DoF sacrificed dependent on the required SI cancellation, bandwidth, external interferers that need to be nulled, etc., and (iv) the trade-off between FD and beamforming will become more favorable for larger arrays. In [52], we further investigate this problem and develop efficient algorithms for cases with large-scale arrays (e.g., N = 36/72).

# 2.3 Circuit Implementation

## 2.3.1 Integrated Circ.-RX Phased Array RX

#### 2.3.1.1 Shared Antenna Interface

This work deploys the N-path-filter-based combined-circ.-RX described in [53] as a sharedantenna interface that merges a commutation-based linear periodically time-varying (LPTV) non-magnetic circulator with down-converting mixer and directly provides the BB signals (Fig. 2.6). The availability of 8-phase at the BB node of N-path-filter further simplifies the beamforming. The 50  $\Omega$  quarter wavelength transmission lines are implemented using one capacitor-inductor-capacitor section with off-chip inductors. Each circ.-RX requires two sets of 8-phase non-overlap clock with 90° phase shift to drive the switches on each side of N-path-filter. The N-path-filter deploys 8-path to increase the ANT-RX BB recombination gain and achieve harmonic cancellation for the 3rd and 5th harmonic. The resistance of transistor used as a switch is around 3.5  $\Omega$ , and the source/drain of the FETs are biased at 0.6 V and DC coupled to BB gm-cells. The gate of the switches are AC coupled to buffers and are biased at 0.75 V (DC level of 12.5% pulse swinging from 0.6 V to 1.8 V). An input clock at 4 times the operating frequency ( $f_{clk}$ ) provides eight output phases in the Johnson-counter-based divide-by-4 in each circulator.



Figure 2.6: Combined non-magnetic non-reciprocal circulator-receiver deployed as integrated shared antenna interface.

### 2.3.1.2 Phased Array RX

A scalable 65 nm CMOS 730 MHz 4-element circ.-RX phased array is implemented (Fig. 2.7). The circ.-RX as described before provides 8-phase BB nodes which simplifies the RX beamforming, four 8-bit (1-bit sign and 7-bit amplitude) programmable differential  $g_m$ -cells are connected to each BB node of each element circ.-RX and then combined in the current domain across all 4 elements into low-input impedance IQ TIAs enabling Cartesian beamforming. The TIAs are implemented using two-stage op-amps. Therefore, the complex-valued weight (phase and gain) applied to each element and summation across all the elements are performed simultaneously while maintaining low noise and high linearity. Besides, scalability across multiple chips is feasible using the low input impedance combining point at the TIA input. By connecting a second chip's low impedance node to the first one, and turning off the second chip's TIA, and the current from the  $g_m$  cells of the second chip can be combined into the TIAs of the first chip in the current domain (Fig. 2.8). Thanks to the low impedance



Figure 2.7: Block/Circuit Diagram of scalable 4-element FD circ.-RX phased array.

node provided by the TIA of the main chip, board trace capacitance does not degrade BW performance.



Figure 2.8: Tiling two circ.-RX phased array to form 8-element FD circ.-RX phased array.

Although typical Cartesian vector modulator covers full 360° using 0°, 90°, 180°, and 270° phases, this work deploys oversampling vector modulator using 8-phases (0°, 45°, ..., 315°) provided by BB nodes of N-path filter. Fig.2.9 shows vector modulation using 3-bit resolution (1-bit sign and 2-bit amplitude) where oversampled vector modulation (8 phases rather than 4 phases) is beneficial in terms of complex-gain accuracy [54], noise and power, but increases the circuit complexity and the associated area consumption. If the desired weight on an element is vector  $w \angle \theta$ , the amplitude and phase can be generated using two adjacent phases (lag and lead),  $a_{\text{lag}} \angle \theta_{\text{lag}}$  and  $a_{\text{lead}} \angle \theta_{\text{lead}}$  shown in Fig. 2.9, and solving for real and imaginary part of  $a_{\text{lag}} \angle \theta_{\text{lag}} + a_{\text{lead}} \angle \theta_{\text{lead}} = w \angle \theta$ , we can find amplitudes for lag and lead vectors equal to:

$$a_{\text{lag}} = w \left| \frac{\cos(\theta) \cdot tg(\theta_{\text{lead}}) - \sin(\theta)}{\cos(\theta_{\text{lag}}) \cdot tg(\theta_{\text{lead}}) - \sin(\theta_{\text{lag}})} \right|,$$
(2.7)

$$a_{\text{lead}} = w \left| \frac{\cos(\theta) \cdot tg(\theta_{\text{lag}}) - \sin(\theta)}{\cos(\theta_{\text{lead}}) \cdot tg(\theta_{\text{lag}}) - \sin(\theta_{\text{lead}})} \right|.$$
(2.8)

The calculated amplitude is rounded to nearest integer to program the  $g_m$ -cells. If only the two adjacent vectors are not used, vectors are not orthogonal and mapping from desired vector to each phase amplitude is not straight forward and can be found using exhaustive search or  $\Sigma\Delta$  algorithm [54].



Figure 2.9: Beamforming using 3-bit vector modulator: (a) 4-phase (b) 8-phase.

An inverter-based amplifier is used to buffer output of TIA from off-chip IF balun (Fig. 2.10) due to their proper linearity performance at low-voltage supplies. Commonmode feedback circuitry similar to [55] is deployed to ensure proper output DC value. Buffer FETs are sized to provide unity gain with 100  $\Omega$  differential load. The buffer needs to be loaded with 100  $\Omega$  all the time due to its high intrinsic gain to avoid instability. The 2.5 pF compensation capacitors ( $C_c$ ) are used to ensure common-mode stability and large 7.8 k $\Omega$ resistors ( $R_L$ ) are used to extract output DC voltage value.



Figure 2.10: Inverter-based output buffer with common-mode feedback.

#### 2.3.1.3 Clock Distribution

Clock distribution is challenging in any multi-antenna system due to the phase uncertainty that can cause between the elements. Fig. 2.11 presents the clock distribution between 4element and divider circuit detail and it's node voltage at the reset state. To ease the clock distribution two differential clock signal with  $4 \cdot f_{clk}$  frequency, is distributed on the chip between the 4-element symmetrically. These two clocks are phase-shifted through multiplexed digital delay cells with analog varactor-based fine-tuning to cover a range of  $\pm$  70° around the nominal phase setting at 730 MHz based on schematic simulations. To synchronize elements and remove the phase ambiguity upon restart, the divide-by-4 in all the elements should reset by the same signal and then triggered at the same rising edge of the clock. The dividers are implemented using TSPC flip-flops [17] and a pull-down NMOS is used at their output for reset purpose. Note flip-flop should not have any interstage node with unknown voltage ar reset state. When the reset signal is high, it resets the divide-by-4 in elements asynchronously and also keeps clock signals at the low level. When the reset signal toggle to the low value, the reset circuitry keeps clock signals level low until the first rising edge of the clock. This avoids undesirable clock edge and reset signal edge signal collision and passing skewed clock shapes to trigger the dividers. This makes sure that all of the elements dividers are exactly triggered with the same clock edge and are synchronized.



Figure 2.11: Circ.-RX phased array clock distribution and synchronize and reset circuitry.

### 2.3.1.4 Programmable $g_m$ -cell

The implementation details of programmable  $g_m$ -cell is shown in Fig. 2.12. Each element eight BB phases are connected to four 8-bit programmable differential  $g_m$ -cells, a1-bit sign,and a 7-bit amplitude control. Each programmable  $g_m$ -cell consists of a cross switch circuit using complementary skim to flip differential phases and 127 switchable identical inverter-based  $g_m$ -cell in parallel to vary the amplitude. An inverter with shorted inputoutput is used to self-bias the input side of each phase  $g_m$ -cells. Inverter-based  $g_m$ -cell proposed in [56] is deployed which provides common-mode rejection and output self-bias using  $M_{PB}$  transistors. High/low threshold voltage (HVT/LVT) FETs are used for proper circuit bias.  $M_{PB}$  and  $M_P$  devices are respectively PMOS HVT and LVT devices with W/L= 800 nm/250 nm and  $M_N$  is HVT NMOS sized to W/L = 400 nm/250 nm. All  $g_m$ -cell outputs are connected to the summation point (TIA input) which is connected to pads and can see different impedance depending on the package and PCB parasitics making it hard to implement common-mode feedback around  $g_m$ -cells. Hence, using output self-bias  $g_m$ -cells is critical.



Figure 2.12: Circuit details of programmable g<sub>m</sub>-cells deployed for beamforming.

### 2.3.2 Phased Array TX

A custom-designed TX phased array is implemented using off-the-shelf discrete components (Fig. 2.13). The signal is divided into 8 channels after one stage of amplification. Each channel contains a cascade of two 180° phase shifters (Mini-circuits JSPHS-1000+) to cover the full 360° range, in series with two-stage of a high-linear low-noise amplifier (Mini-circuits HXG-122+) and programmable attenuator. This structure obtains amplitude control while maintaining good noise performance while a 7-bit attenuator (Skyworks SKY2343-364LF) can provide maximum attenuation of 31.75 dB with 0.25 dB resolution. Based on system calculation, the design can achieve up to 30 dB gain with an out-referred IP<sub>3</sub> as high as +31 dBm while the noise figure (NF) is less than 3.3 dB.

### 2.3.3 Slot Loop Antenna

The slot loop antenna structure is used due to their inherent wideband operation. The antenna is fabricated on the FR-4 PCB and can radiate on both frontside and backside direction. A metal sheet is used as a reflector at the back of the antenna array with quarter wavelength distance to redirect the radiation to the front side. The EM simulations using the Mentor Graphics IE3D shows antenna gain of 6.6 dB (see Fig. 2.14).

## 2.4 Experimental Results

The chip microphotograph of the 65 nm CMOS 4-element FD circ.-RX phased array is shown in Fig. 2.15. It has a total active area of 3.6 mm<sup>2</sup> and is mounted in an 88-pin QFN package, and two chips are mounted on FR-4 PCB to realize an 8-element array for all the measurements.

A custom programmable antenna tuner board using discrete components is implemented using a capacitor-(inductor/capacitor)-capacitor section shown in Fig. 2.16. Peregrine PE64906 digitally tunable caps are used which provide 0.9 pF to 4.6 pF capacitor with discrete 119 fF steps and Coilcraft 5.1 nH 0603HP is used as a fixed inductor. Simulation using ADS in-



Two 180 degrees phase-shifters has been used to provide full-360 degrees of phase-shift.

(a)



Figure 2.13: A custom-designed 8-element transmitter phased array: (a) Block diagram, and (b) PCB implementation.

cluding PCB parasitics shows that it can translate all antenna impedances inside the VSWR circle of 2 (return loss >10 dB) to an impedance inside the VSWR circle of 1.2 (return loss >20 dB).

## 2.4.1 Single-Element Circ.-RX Measurements

The measured two-port TX-to-ANT S-parameter of the circulator for a clock frequency of 730 MHz are shown in Fig. 2.17. Note that the RX is not available as a separate RF port, and



Figure 2.14: Slot loop antenna simulated radiation pattern.



Active area: 3.6mm<sup>2</sup>

Figure 2.15: FD N-path-filter-based circulator-receiver conceptual architecture and block diagram.

hence the circulator's ANT-to-RX and TX-to-RX performance cannot be measured directly using S-parameters. The circ.-RXs exhibit 1.7 dB TX-ANT loss, +28 dBm TX-ANT IIP3. Since the circulator is based on an 8-path filter, the signals have to be recombined to provide



Figure 2.16: Custom designed antenna tuner: (a) circuit diagram (b) PCB

differential I/Q outputs. For ANT-BB measurements, the BB g<sub>m</sub>-cells are programmed for 3rd and 5th, 11th and 13th, and so on harmonic rejection [57]. The I+/- outputs are created with +1,  $+\frac{1}{\sqrt{2}}$  and  $-\frac{1}{\sqrt{2}}$  weights for 0°/180°, 45°/225°, and 135°/315° phases, and Q+/outputs are created with  $+\frac{1}{\sqrt{2}}$ , +1, and  $+\frac{1}{\sqrt{2}}$  weights for 45°/225°, 90°/270° and 135°/315° phases. The single-element ANT-BB shows up to 50 dB (41 dB nominal) conversion gain, and supports more than 20 MHz bandwidth for all settings (Fig. 2.18(a)). The in-band/outof-band ANT-BB IIP3 is -31 dBm/+22.5 dBm ANT-BB IIP3 and 5 dB single-element NF (Figs. 2.18(b) and 2.18(c)). The single-element ANT-BB OOB-IIP3 and NF are comparable to FD mixer-first receivers presented in [34, 57] (22 dBm vs. 22.5 dBm/25 dBm and 5 dB vs. 5–8 dB/4 dB) while this work encompasses the receiver and an on-chip shared-ANT interface. The good out-of-band ANT-BB IIP3 performance is thank to BB g<sub>m</sub> and TIA cells good linearity performance.

### 2.4.2 8-Element FD Phased-array TRX Measurements

Array FD measurements are performed with the 2×4 rectangular array of slot loop antennas described earlier (Fig. 2.19). Two ICs are tiled on a PCB to realize an 8-element FD circ.-RX phased array, and a custom 8-element phased-array TX PCB described earlier is built using discrete components.

Array FD SIC measurement is shown in Fig. 2.19. The TX-BB isolation of each circ.-



Figure 2.17: Single-element circ.-RX measurements: (a) TX-ANT S-parameters demonstrating non-reciprocity, (b) TX-ANT IIP3.

RX is limited to only  $\sim 15$  dB. This poor isolation is due to layout, QFN package, and PCB parasitics. The average array SIC over 16.25 MHz is only 23 dB When the TX and RX arrays are configured for nominal broadside beamforming (all array weights equal to 1). Note that the nominal SIC for broadside beamforming is a strong function of the antenna near field response. Using the concept described in section III, and optimization of beamforming DoFs on the TX and RX arrays for SIC while allowing 3 dB TX and RX array gain loss, 40.7 dB array SIC is achieved over 16.25 MHz. Although the profile is very wideband and very similar to the simulated profile in Fig. 2.4(d), it is at somewhat lower SIC levels, since those simulations neglected second-order effects such as the circulator's internal isolation, quantization of beamforming weights, etc. Finally, custom-designed tuners described earlier are integrated with the antennas and the tuner is optimized with the same configuration across all the elements, so it does not change the beam pattern. Co-optimizing of the tuners vields 50 dB array SIC over 16.25 MHz. The measured RX and TX array gains across frequency for broadside excitation are shown in Fig. 2.19. Measurements verify the 3 dB array gain loss with these beamforming weights optimized for SIC. The synthesized TX and RX array patterns for these weights are depicted in Fig. 2.4(a) and (b).

The array TX array power handling is evaluated with the beamforming and SIC config-



Figure 2.18: Single-element circ.-RX measurements: (a) TX-ANT S-parameters demonstrating nonreciprocity, (b) TX-ANT IIP3 and (c) NF.

ured in terms of the TX-induced RX compression point. Note that the SIC is performed in the BB after the beamforming. Thus, the per-element integrated circulator has to tolerate CT-SI in addition to its own SI across all elements. The gain imparted to a weak in-band signal radiated towards the array is monitored while the TX array power  $(P_{TX}AG_{TX})$  swept. The 1 dB compression of the weak in-band signal occurs which can be inferred as the TX



Figure 2.19: Measured full-duplex phased-array performance across 8-elements (tiling of 2 ICs): (a) array SIC, (b) impact of optimized weights to achieve SIC on the TX/RX array gain.



Figure 2.20: Gain compression of a small received signal under the influence of TX power with optimized weights with and without the antenna tuner

array power handling at +10.5 dBm without deploying antenna tuner, and is improved to +16.5 dBm with antenna tuner (Fig. 2.20). Although each element power handling is smaller



Figure 2.21: Two-tone TX test tracking the TX total SI and its IM3 products at the receiver output with additional digital SIC.

than a single-element counter part [12] (+1 dBm vs. +8 dBm), the total array power is much larger thanks to TX array gain.

Two-tone TX tests are performed by tracking the TX main SI and its IM3 at RX BB output the beamforming with beamforming and SIC configured, shown in Fig. 2.21. We have also implemented digital SIC in Matlab after capturing the BB signals using an oscilloscope (a 12-bit quantizer) [12]. The nonlinear Volterra-series-based digital SIC cancels not only the main SI but also the IM3 distortion generated from the SI. The effective IIP3 referred to the TX array power is +17.5 dBm which shows the power level that adding additional BB canceller does not help as the IM3 level is high as the main tones. At +16.7 dBm average TX array power, digital SIC can suppress the residual total SI and its associated IM3 to below -84 dBm, indicating 100 dB total array SIC. Another 19 dB of SIC is required to suppress the SI to the array noise floor (on the effective IIP3 graph, the noise floor would be at  $N_{floor}/AG_{RX}$ =-103 dBm), and can be potentially achieved with additional analog SIC. This enables a link range of 2.6 km at the operation frequency.



Figure 2.22: Wireless FD demonstration setup.

## 2.4.3 FD Demonstration

An OFDM-like BB signal is generated at a sampling rate of 160 MSa/s, and it consists of 10 sub-carriers each with a bandwidth of 0.4 MHz occupying a total bandwidth of 5 MHz (DC to 1 MHz has been omitted due to implementation limitations related to the high-pass cut-off frequency of the off-chip BB baluns). The OFDM-like signal is pulse-shaped with a square-root raised cosine (SRRC) filter with a roll-off factor of  $\beta = 0.22$ . The total length of the OFDM-like signal is chosen to be 50000 samples with an extra 2000 samples to sync the received sequence to the transmitted signal.

Fig. 2.22 shows the demonstration setup. Unlike the RX output, the TX board input is at RF frequency. Hence, an I/Q quadrature modulator (Texas Instruments TRF370417 EVM module) is used to up-convert the BB signal for the TX board. The BB OFDM-



Figure 2.23: Demo results: A -31dBm desired signal radiated from 20ft away from a single antenna is recovered while transmitting a 5MHz OFDM-like signal with +8.7dBm TX array power.

like signal is generated in MATLAB, and is fed to an Agilent 33500B arbitrary waveform generator (AWG), which is connected through a balun to the quadrature modulator. The clocks of the RFIC circulator-receiver and the transmitter are shared from a signal source running at 4 times the frequency of operation (2.920 GHz) to lower the effects of uncorrelated phase-noise [58–60]. A separate frequency-division module is used to divide down the clock to 730 MHz for the transmitter. A CW tone is radiated from 20 ft using a single antenna toward the implemented 8-element FD phased array transceiver while it is radiating described OFDM modulated signal.

Fig. 2.23 shows the power spectral density at the RX BB output before and after the digital SIC of the TX leakage. First, to verify successful digital SIC of modulated signals, a +8.7 dBm average TX array power signal is applied and the residual BB leakage is captured using an oscilloscope. The implemented digital SIC considers the nonlinear terms up to the 7<sup>th</sup> order, with a delay spread length of 45 samples, resulting in 315 total unknown digital

canceller coefficients. An initial portion of the captured data (about 80  $\mu$ sec) is used to train the canceller coefficients. 30 dB digital SIC and about 80 dB overall SIC have been achieved. Then, a -31 dBm signal radiated from a single antenna located 20 ft away is recovered while it is buried under the TX leakage at the RX BB before digital SIC.

Table 2.2: Comparison of the proposed FD phased array circ.-RX with the state-of-the-art FD RXs with an integrated shared antenna interface.

|              |                                            | Cornell Uni.<br>JSSC 2015 [34]                       | Columbia Uni.<br>JSSC 2017 [35]                                                      | Columbia Uni.<br>JSSC 2018 [12]                                                               | Uni of Washington<br>ISSCC 2018 [37]                                                             | Columbia Uni.<br>ISSCC 2019 [61]                                                                                  | This Work                                                                                                 |
|--------------|--------------------------------------------|------------------------------------------------------|--------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
| Architecture |                                            | Mixer-first TRX with<br>Active Baseband<br>Duplexing | RX with integrated<br>magnetic-free N-path-<br>filter-based circulator<br>and BB SIC | Magnetic-free N-path-<br>filter-based circulator-<br>receiver with on-chip<br>balance network | Full-duplex transceiver<br>employing an electrical<br>balance duplexer and<br>dual RF cancellers | Full-duplex MIMO<br>circulator-RX with MIMO<br>RF and Shared-Delay<br>Baseband Self-<br>Interference Cancellation | Full-duplex phased-array<br>with integrated magnetic-<br>free N-path-filter-based<br>circulator-receivers |
| RX metrics   | RX Frequency Range                         | 0.1-1.5GHz                                           | 0.6-0.8GHz                                                                           | 0.61-0.975GHz                                                                                 | 1.6-1.9GHz                                                                                       | 2.2GHz                                                                                                            | 0.61-0.975GHz                                                                                             |
|              | Number of Antenna<br>Paths                 | 1                                                    | 1                                                                                    | 1                                                                                             | 1                                                                                                | 2 MIMO elements per IC<br>(scalable)                                                                              | 4 per IC, 2 ICs tiled for FD<br>measurements                                                              |
|              | Gain                                       | 53dB                                                 | 42dB                                                                                 | Max: 43dB, Nominal:<br>28dB                                                                   | 42dB                                                                                             | 30dB                                                                                                              | Max: 51dB, Nominal: 41dB<br>(single-element)                                                              |
|              | Noise Figure                               | 5-8dB                                                | 8.4dB                                                                                | 6.3dB                                                                                         | 8.1dB                                                                                            | 9.5dB (single-element)                                                                                            | 5dB (single-element)                                                                                      |
|              | OOB IIP3                                   | +22.5dBm                                             | +19dBm                                                                               | +15.4dBm                                                                                      | N/R                                                                                              | N/R                                                                                                               | +22.5dBm (single-element,<br>100MHz offset)                                                               |
| FD Metrics   | Integrated SI<br>Suppression Domains       | Analog BB                                            | RF + Analog BB                                                                       | RF                                                                                            | RF                                                                                               | RF + Analog BB                                                                                                    | RF + Spatial                                                                                              |
|              | Amount of Integrated<br>SI Suppression     | 33dB across<br>300kHz TX BB BW                       | 42dB SIC across<br>12MHz BW                                                          | 40dB SIC across<br>20MHz BW                                                                   | 72.8/70.1/65.2dB SIC<br>across 20/40/80MHz<br>BW                                                 | 45dB SIC across 20MHz<br>BW                                                                                       | 50dB array SIC across<br>16.25MHz BW<br>(Across 8 elements)                                               |
|              | Effective IIP3 with<br>respect to TX Power | ~0dBm at 43/53dB<br>gain <sup>1</sup>                | +1dBm at 42dB gain                                                                   | +9dBm at 26dB gain                                                                            | N/R                                                                                              | N/R                                                                                                               | +17.5dBm<br>(TX Array Power)                                                                              |
|              | Overall TX Port Power<br>Handling          | -17.3dBm <sup>2</sup>                                | -7dBm                                                                                | +7dBm <sup>2</sup>                                                                            | +5dBm <sup>4</sup>                                                                               | +14dBm <sup>2</sup>                                                                                               | +16.5dBm <sup>2</sup><br>(TX Array Power)                                                                 |
|              | RX Degradation in<br>Full-Duplex Mode      | ~2-5dB3                                              | 2.5dB                                                                                | 1.7dB                                                                                         | 1.6dB                                                                                            | 1.7 (SIC) / 2.1dB<br>(SIC+Cross-Talk SIC)                                                                         | 3dB<br>(Array Gain Degradation)                                                                           |
|              | Overall SI<br>Suppression                  | 33dB                                                 | 85dB<br>(incl. digital SIC)                                                          | 80dB<br>(incl. digital SIC)                                                                   | N/A                                                                                              | N/A                                                                                                               | 100dB array SIC<br>(incl. digital SIC)                                                                    |
| Resources    | RX Power                                   | 43-56mW<br>(incl. TX)                                | 100mW signal path<br>(10mW LO + 30mW<br>BB canceller)                                | 72mW                                                                                          | 106mW                                                                                            | 46mW / element<br>(92mW for 2-element on<br>an IC)                                                                | 8mW / element<br>(32mW for 4-element on an<br>IC)                                                         |
|              | Antenna Interface<br>Power                 | Incl. in RX power                                    | 59mW (at 0.7GHz)                                                                     | 36mW (at 0.7GHz)                                                                              | N/A                                                                                              | 156mW per element                                                                                                 | 26.25mW per element                                                                                       |
|              | Technology                                 | 65nm CMOS                                            | 65nm CMOS                                                                            | 65nm CMOS                                                                                     | 40nm CMOS                                                                                        | 65nm CMOS                                                                                                         | 65nm CMOS                                                                                                 |
|              | Active Area                                | 1.5mm <sup>2</sup>                                   | 1.4mm <sup>2</sup>                                                                   | 0.94mm <sup>2</sup>                                                                           | 4mm <sup>2</sup>                                                                                 | 5.6mm <sup>2</sup> (2.8mm <sup>2</sup> / element)                                                                 | 3.6mm <sup>2</sup> (0.9mm <sup>2</sup> / element)                                                         |

1. From Fig. 31(a) in the paper. 2. Limited by ~1dB gain compression induced in the receive signal. 3. at -17.3dBm TX power. 4. SIC under TX Power

N/A: Not Applicable, N/R: Not Reported

## 2.5 Summary

This work is compared to FD single-element and MIMO RX radios in Table 2.2. Although this work has the highest power handling and overall array SIC, it presents the lowest NF and far superior potential FD link range among FD radio with a shared antenna interface. Albeit integrating shared antenna interface, this work consumes less area and power per element thanks to using beamforming DoF for SIC instead of using explicit area- and powerhungry cancellers as well as using circ.-RX which combines the integrated circulator with a mixer-first receiver.

In this chapter, we presented a scalable 4-element FD phased array RX which combines the functionality of phased array with FD radio to achieve high throughput spectrum efficiency. Join optimization of beamforming degrees of freedom on both the TX and RX sides is deployed for beamforming plus SIC with a minimum penalty on array gains. The beamforming is implemented with minimum overhead by exploiting the available baseband outputs of N-path filter based integrated circ.-RX. A 730 MHz scalable 4-element FD phased array circ-RX prototype is presented in 65 nm CMOS which achieves up to 50 dB of array SIC over +16.25 MHz with less than 3.5/3 dB degradation in TX and RX array gain across 8-element while handling +16.5 dBm TX array power. In conjunction with digital SIC, 100 dB of array SIC suppression is presented.

# Chapter 3

# **Full-Duplex MIMO Wireless**

MIMO technology promises to significantly enhance system performance in coverage, capacity and user data rate through diversity/capacity gain. In addition, multi-user MIMO can simultaneously serve multiple users which is vital for femtocell base stations and APs. FD operation enables simultaneous transmission and reception at the same time which can potentially double the data throughput. Besides, full-duplex can provide benefits on higher layers such as better spectral efficiency, reducing network and feedback signaling delays, and resolving hidden-node problems to avoid the collision. Although FD MIMO wireless benefits from both MIMO diversity/capacity gain and FD technology spectrum efficiency, it suffers from CT-SI between each TX-RX pair in addition to SI from each TX to its own RX (Fig. 3.1). Thus, a cancellation path is required from each TX to all other RXs Therefore, an *N*-element FD MIMO transceiver requires  $N^2$  cancellation paths, which means the complexity of canceller in terms of associated area, power dissipation and noise penalties grow as  $O(N^2)$ . This is both area- and power-hungry, and not feasible for radios with a large number of elements. Furthermore, the noise from SI and CT-SI cancellation paths can accumulate and further degrades the radio sensitivity level.

In this chapter, we present how FD operation can be integrated with MIMO operation using RF and shared-delay BB SIC and CT-SIC [61]. MIMO operation increases the SIC requirements by a factor of  $N^2$ , resulting in severe area and power consumption and extra noise



Figure 3.1: FD MIMO links benefits from MIMO diversity/capacity gain as well as FD spectrum efficiency. However, CT-SI must be addressed in addition to SI.

penalty. We present the detailed analysis, design, and implementation of RF/BB cancellers that address this challenge, as well as a bootstrapping technique to enhance the power handling of integrated circ.-RX. A 65 nm CMOS scalable 2-element MIMO FD circ.-RX array with integrated N-path-filter-based non-magnetic circulators is described that exploits bootstrapping in the circulator N-path filters to enhance TX power handling, and features areaand power-efficient passive RF and active BB wideband MIMO cancellation with shareddelay elements to address the  $O(N^2)$  cancellation challenge. Finally, the proposed solution is validated by performing extensive measurements.

# 3.1 FD MIMO Cancellation: Complexity, Trade-offs and Architecture

Based on the antenna interface design, the cumulative CT-SI can even be larger than SI. Generally, in an N-element FD MIMO transceiver, assuming orthogonal coding, the total SI and CT-SI power can be N times larger than the single-element case (assuming similar power level for the SI and CT-SI across all elements). This can substantially limit the FD MIMO transceiver power handling compared to the single-element case and alleviate benefits gained from deploying a multi-antenna system. Hence, initial SI and CT-SI cancellation are critical in the early stages of the RX.

Several leakage paths contribute to the SI and CT-SI such as direct coupling from the integrated circulator, on-chip routing as well as PCB board traces, reflection at the circulator antenna port due to mismatch caused by the wirebond, package parasitics and off-chip balun, reflection due to antenna impedance variation, direct coupling between adjacent antennas, and environmental reflections from nearby objects [36]. The overall channel response is the combination of all these leakages, and can consist of a large group delay of the order of tens of nanoseconds. Thus, delay cells with large group delays are necessary to achieve wideband cancellation in FD radios.

### 3.1.1 Dual Injection MIMO Canceller

Ideally, cancellers should be low-noise, highly linear, and with the minimum area and power overhead. The cancellation signal can be injected at different stages of the RX chain. Injecting before the low-noise amplifier in the RX chain using passive noise-less components can substantially enhance the power handling with minimum NF penalty. Achieving wideband cancellation through the implementation of true-time delay at RF, however, is very challenging and is typically achieved at baseband through the use of noisy active delay cells [36]. In this work, we pursue a dual injection MIMO cancellation technique, depicted in Fig. 3.2, where passive noise-less cancellers provide initial SI and CT-SI cancellation at RF before amplification and relax RX dynamic range requirements. Then FIR-based active BB cancellers further knockdown SI after the BB amplifier with minimum NF degradation.

### 3.1.2 Shared-Delay Baseband Canceller Architecture

A traditional baseband canceller architecture that employs parallel programmable delay cells with gain control to emulate the SI channel through the implementation of an FIR



Figure 3.2: Proposed dual injection MIMO RF and BB canceller to address power handling and complexity challenges of FD MIMO implementation.

filter [20]. Fig.3.3(a) presents an extension of such an approach to MIMO. Parallel delay cells are necessitated in [20] by the use of transmission lines on board to realize delay, making it challenging to tap the signal from intermediary points. Active delay elements in an integrated setting enable cascaded delay lines with intermediary taps [36]. In both [20] and [36], however, the ability to control phase shift independent of delay is absent. In our implementation, we employ FIR filters operating on both I and Q components of the TX BB signal (obtained through IQ downconversion of the TX output, which has the advantage of capturing PA nonlinearities as well as enabling mitigation of LO phase-noise-induced residual SI assuming a common LO [58]) and injecting into both I and Q paths of the RX, thus enabling independent control of phase and delay through IQ vector modulation.



Figure 3.3: MIMO baseband canceller structure and power consumption distribution: (a) typical MIMO canceller (b) proposed shared-delay BB canceller architecture.

Furthermore, MIMO operation additionally complicates matters as the number of delay-cells and gain controls scale quadratically with the number of elements and are area- and powerhungry (Fig.3.3(a)). In this work, a shared-delay-cell architecture, shown in Fig. 3.3(b), is proposed where  $g_m$ -cell based vector modulators tap from the discrete delay steps of each TX element and inject into all RX BB paths to approximate the SI channel. Since each TX path has a single delay line whose taps are used in common for all RX paths, the number of required delay lines increases only linearly with the number of MIMO elements. Although the number of required  $g_m$ -cell based vector modulators still scales quadratically with the number of elements, the canceller power and area consumption are dominated by the delay elements. Therefore, shared-delay BB cancellation in conjunction with passive MIMO RF cancellation directly addresses the  $O(N^2)$  power consumption challenge.

### 3.1.3 Baseband Canceller Noise Penalty

The noise penalty associated with the proposed baseband cancellation must be analyzed in the context of active delay cells. Fig. 3.4 shows the framework used for the following analysis of noise degradation. Let  $ISO_{ii}$  represent TX-RX isolation in the  $i^{th}$  circ.-RX. Note that the circ.-RX downconverts the TX signal to baseband. Furthermore,  $ISO_{ii}$  includes not only the RF isolation of the circulator, but also the RF-SIC provided by the feed-forward capacitors. Additionally, it also includes the intrinsic downconversion gain that the circulator has due to its architecture. The signal path transconductance following the circulator in the  $i^{th}$ -element is represented by  $g_{m,RX,i}$ . For cancellation of residual SI in the RX baseband, (delayed) copies of the down-converted TX signal must be injected into the RX following  $g_{m,RX,i}$ . Assuming a single dominant delay in the SI channel, a voltage gain of  $A_{delay,i}$  from the PA output to the corresponding tap in the delay line, and a weight of  $g_{m,SIC}$  in the subsequent  $g_m$ -cell, BB-SIC requires

$$ISO_{ii}g_{m,RX} = A_{delay,i} \cdot g_{m,SIC}.$$
(3.1)

The noise factor degradation in the system due to active BB-SIC is then given by,

$$\Delta F_{SIC} = \frac{v_{n,out,d}^2 ISO_{ii}^2}{A_{delau,i}^2 k_B T \times R_S \times CG_i^2}$$
(3.2)

where  $v_{n,out,d}^2$  represents the total output noise of the delay element. Note that the noise associated with the delay element is considered but not the SIC g<sub>m</sub>-cell, as the delay element's noise contribution dominates in our implementation (the g<sub>m</sub>-cell input-referred noise power is around 6 aV<sup>2</sup>/Hz while delay-cell output-referred noise power is 52 aV<sup>2</sup>/Hz). In the MIMO case, the baseband cancellers must address residual SI as well as residual CT-SI. Using similar constraints for CT-SIC as used for SIC in (3.1), the overall degradation in NF in the MIMO case is given by,

$$\Delta F_{i,overall} = \frac{v_{n,out,d}^2 ISO_{ii}^2}{A_{delay,i}^2 k_B T \times R_S \times CG_i^2} + \sum_{j \neq i}^N \frac{v_{n,out,d}^2 ISO_{ij}^2}{A_{delay,j}^2 k_B T \times R_S \times CG_i^2}$$
(3.3)

where  $ISO_{ij}$  represents the ANT-to-ANT isolation between elements *i* and *j* (including the RF CT-SIC and circulator conversion gain) and  $A_{delay,j}$  represents the lumped voltage gain of  $j^{th}$  delay element.



Figure 3.4: Model for the analysis of noise degradation due to active delay-based baseband cancellation.

As discussed in Sec. 3.2.3.2, the output noise of a delay cell is related to its power consumption, delay requirements and linearity. In order to illustrate the impact of delay cell noise, we consider (3.3) assuming  $v_{n,out,d}^2 = 52 \text{ aV}^2/\text{Hz}$  per differential delay tap (based on our implementation). Fig. 3.5 plots the degradation in NF (assuming a nominal NF of 6.5dB) as a function of  $ISO_{ii}$  for different number of MIMO elements. Note that MIMO crosstalk isolation levels  $(ISO_{ij})$  are assumed to be identical to  $ISO_{ii}$ . Also, note that a higher RF SIC in dB on the x-axis is equivalent to a smaller absolute value for  $ISO_{ii}$ . Several trade-offs become apparent. As expected, higher RF SIC/CT-SIC reduces the noise penalty of active delay-based BB cancellation, and at lower RF SIC/CT-SIC levels, a higher MIMO order results in more severe penalties. A higher voltage gain to the output of the delay cells, achieved for instance by coupling a stronger portion of the TX signal, also alleviates the noise penalty, at the expense of the need for higher linearity and power handling in the cancellation path.



Figure 3.5: Impact of the baseband active delay cells on NF as a function of RF SIC and CT-SIC.

# **3.2** Circuit Implementation

Fig. 3.6 presents block and circuit diagrams of the 65 nm CMOS 2.2 GHz full-duplex 2element MIMO circ.-RX. As mentioned earlier, programmable feed capacitors are connected to both sides of the N-path filter inside the integrated circulator to provide passive RF SI and CT-SI cancellation. Furthermore, MIMO shared-delay BB cancellation using 4-stage shared delay lines is implemented to achieve further SI and CT-SI cancellation. A single



Figure 3.6: Block and circuit diagram of the 65nm CMOS 2.2GHz full-duplex 2-element MIMO circ.-RX with high TX power handling exploiting MIMO RF and shared-delay base-band self-interference cancellation.

IQ delay line is used for each TX, with  $g_m$ -cells injecting into each RX IQ path. Therefore, the delay line complexity only grows as O(N) while  $g_m$ -cell complexity grows as  $O(N^2)$ . A two-stage TIA is used for each element to provide a low-input-impedance node into which the output of all  $g_m$ -cell based vector modulators of the BB cancellers can be combined with the RX side  $g_m$ -cells while preserving high linearity. Besides, this low-impedance node can be used to tile ICs to increase the MIMO order and achieve CT-SI cancellation across ICs. This section discusses the design and implementation details of each block in depth.

# 3.2.1 Circulator-Receiver as a Shared MIMO FD Antenna Interface

This works employs the N-path-filter-based circ.-RX described in [53] in a differential configuration as the shared antenna interface which merges a commutation-based linear-periodicallytime-varying (LPTV) non-magnetic circulator with a down-converting mixer, directly providing baseband signals at its output. Each circ.-RX requires two sets of 8-phase nonoverlapping clocks with 90° phase shift to drive the switches on either side of the N-pathfilter. The N-path filter employs 8 paths to decrease the TX-ANT loss, increase the ANT-RX baseband recombination gain, and achieve harmonic cancellation for the 3rd and 5th harmonic. The resistance of the transistor switch in the N-path filter is around 3.5  $\Omega$ , and the sources/drains of the FETs are biased at 0.5 V and DC coupled to BB g<sub>m</sub>-cells. The gates of the switches are AC coupled to the LO buffers and are biased at 0.65 V (DC level of a 12.5% duty-cycle pulse swinging from 0.5 V to 1.7 V). An input clock at 4 times the operating frequency ( $f_{clk}$ ) provides eight output phases through a Johnson-counter-based divide-by-4 circuit in each circulator.

Prior circ.-RX circuits exploited off-chip inductors in the  $3\lambda/4$  transmission line that was wrapped around the N-path-filter gyrator [53]. In this work, the circ.-RX operates at 2.2 GHz and exploits wirebond inductance for the same purpose, enabling a realization with no off-chip components. The transmission lines are implemented using lumped capacitorinductor-capacitor sections that employ on-chip metal-insulator-metal (MIM) capacitors and wirebond inductance. There is maximum value for achievable wirebond inductance based on the package size deployed, and thus using the wirebond inductance limit minimum operating frequency. The maximum wirebond length is 6.1 mm for the 12 mm × 12 mm 80-pin QFN package used in this work which limits the operation frequency to more than 1.3 GHz.



Figure 3.7: Two thick-oxide devices stacked to implement the feed capacitor and TX/ANT capacitor banks.

The baseband g<sub>m</sub>-cells are sized for 3rd and 5th harmonic rejection [57], which also provides 6 dB additional gain for the fundamental harmonic. The I+/- outputs are created with +1,  $+\frac{1}{\sqrt{2}}$  and  $-\frac{1}{\sqrt{2}}$  weights for 0°/180°, 45°/225°, and 135°/315° phases, and Q+/outputs are created with  $+\frac{1}{\sqrt{2}}$ , +1, and  $+\frac{1}{\sqrt{2}}$  weights for 45°/225°, 90°/270° and 135°/315° phases. 3 bits of control is included in RX g<sub>m</sub>-cells for gain control.

### 3.2.2 MIMO Passive RF Cancellers

Since the circ.-RX produces BB signals on the N-path filter capacitors, and the switches on either side of the two-port N-path filter are driven by quadrature clocks, if we inject cancellation currents on either side of the N-path filter, depicted as  $I_I$  and  $I_Q$  in Fig. 3.8, they are downconverted with quadrature phases, thus providing vector-modulated cancellation signals to the BB nodes in conjunction with the differential implementation of circ.-RX. In this work, we exploit programmable feed capacitor banks from the TX port of each circulator



Figure 3.8: Circulator model for feed-capacitor-based RF SIC and CT-SIC analysis.

to either side of the N-path filter, similar to the approach introduced in [23] ( $C_{feed}$  in Fig.3.6). The magnitude and sign inversion of the cancellation signals can be controlled by varying the feed capacitance through a digitally-controlled capacitor bank, and sign-flipping in a differential circulator implementation. Since this approach is completely passive, it addresses the  $O(N^2)$  power consumption challenge, and since vector modulation is achieved without area-hungry inductors, it addresses the  $O(N^2)$  area challenge as well. As compared to the approach in [23], the proposed scheme (i) reduces the loading on any one side by feeding both sides of the N-path filter, and (ii) does not load or impact the antenna interface, which is critical to simultaneously achieving high SIC and CT-SIC, as antenna loading changes the SI and CT-SI channels. The same concept is extended to MIMO CT-SI cancellation by incorporating feed capacitors between each TX and RX pair enabling passive RF CT-SI cancellation.

Each of the feed capacitors for the vector-modulated RF SI cancellers is implemented using 6 bits of control (0 to 550 fF). Furthermore, 6-bit programmable capacitors ( $C_{TX}$ and  $C_{ANT}$ ) in conjunction with a fixed off-chip capacitor form a matching circuit to null ANT/TX wirebond inductance.  $C_{ANT}$  can be used in addition to the feed capacitors,  $C_{feed}$ , to further enhance antenna VSWR coverage. Two-stacked thick-oxide 2.5 V transistors are used as the switches in the capacitor banks to avoid FET breakdown due to large voltage swing at the circulator TX/ANT ports. Fig. 3.7 presents the device stacking and biasing in the programmable feed capacitors,  $C_{feed}$ , and the capacitors at TX/ANT ports,  $C_{ANT/TX}$ . The devices used are triple-well 2.5 V transistors, and the bulk is made to float using a large resistor to ground. Two devices are stacked to enhance power handling. The N-path-filter nodes are biased at 0.5 V, and thus, the source of the second FET is biased at 0.5 V all the time. Therefore,  $D_1$  and  $D_2$  are biased at 2.4 V and 1.45 V respectively to distribute the stress between the two switches in the OFF-state. Breakdown can happen when the devices are OFF and the TX voltage swing gets divided between the stacked switches. Ideally, a two-stacked switch with two 2.5 V devices should be able to handle 19.2 V differential load. Note that in this work, the TX power level is limited to +14 dBm due to the TX-induced RX compression, which corresponds to only 2.3 V peak voltage across 100  $\Omega$ . Therefore, even with VSWR effects, we do not expect the switches to limit power handling.

Fig 3.8 shows a simplified single-ended circuit diagram of the circ.-RX with cancellation signals. The N-path filter is assumed to be ideal except for the presence of finite switch resistance  $R_{SW}$ . When  $R_{SW}$  is placed in series, the ideal N-path filter core (a large number of paths and no switch parasitic capacitance) acts as an ideal gyrator. The circuit can now be analyzed using conventional microwave circuit analysis techniques. We initially assume that  $R_{SW}=0$ . When the SI is completely nulled at BB nodes, the RF nodes can also be assumed to be a virtual ground with respect to the TX excitation,  $V_X = V_Y=0$ , and the cancellation injection currents,  $I_I$  and  $I_Q$ , can be calculated as  $V_{TX} \times jC_{feed,SI,I}\omega$  and  $V_{TX} \times jC_{feed,SI,Q}\omega$ . The required feed capacitors to compensate for reflection from an antenna impedance of  $Z_{ANT} = Z_{ANT,i} + jZ_{ANT,Q}$  can be calculated as:

$$C_{feed,SI,I} = \frac{4Z_{ANT,Q}}{\omega(Z_{ANT,Q}^2 + (Z_{ANT,I} + Z_0)^2)},$$
(3.4)

$$C_{feed,SI,Q} = -\frac{2(Z_{ANT,i}^2 + Z_{ANT,Q}^2 - Z_0^2)}{\omega Z_0 (Z_{ANT,I} + Z_0)^2}.$$
(3.5)

Similarly, for CT-SIC, assuming  $Z_{ANT} = Z_0$  and antenna coupling  $H(\omega) = H_I(\omega) + jH_Q(\omega)$ , the required cross-element feed capacitor values can be calculated as:

$$C_{feed,CT-SI,I} = \frac{2H_I(\omega)(1 + \frac{R_{SW}}{2Z_0})}{Z_0\omega(1 + \frac{R_{SW}}{Z_0})},$$
(3.6)

$$C_{feed,CT-SI,Q} = \frac{2H_Q(\omega)(1 + \frac{R_{SW}}{Z_0})}{Z_0\omega}.$$
(3.7)

The antenna VSWR and antenna coupling coverage calculated from the above equations for capacitors that range from -542.5 fF to 542.5 fF in 8.75 fF steps are plotted in Fig. 3.9(a) and (b). A VSWR of 1.5 can be covered, and in measurement, this range is expanded to VSWR=2 through the usage of  $C_{ANT}$ . Also, up to -14.8 dB of antenna coupling be covered, which agrees fairly well with our measurements.

The layout interconnections for the SIC/CT-SIC feed capacitors bear careful attention. The feed capacitors and N-path filters are integrated with each other to minimize the layout parasitics. The metal layer with minimum resistance is used for routing, and parasitic capacitance is taken into account in the matching circuits at the TX ports.

### 3.2.3 MIMO Baseband Canceller with Shared Delay-Cells

The BB MIMO delay-based FIR filtering canceller taps from each TX, and consists of an IQ downconversion mixer, cascaded active  $g_m$ -C delay cells, and programmable  $g_m$ -cells on each delay tap acting on I and Q, effectively realizing vector modulation for programmable amplitude and phase on each delay tap.

#### 3.2.3.1 4-Path Passive Mixer

A 3-bit programmable capacitor bank along with a 4-path doubly-balanced IQ passive mixer is employed to sense each TX signal and downconvert it to BB frequencies for BB cancellation, depicted in Fig .3.10. The programmable capacitor bank employs 2-stacked devices similar to Fig. 3.7 to improve TX power handling. Inverter-based TIAs are used to provide



Figure 3.9: Calculated antenna (a) VSWR and (b) cross-talk coupling coverage, which shows that a VSWR of 1.5 (red circle) or up to -14.8 dB of antenna coupling (red circle) can be covered.

maximum TX signal swing to the delay cells to minimize BB canceller noise contribution, as described earlier.

#### 3.2.3.2 Baseband Active Delay Cells

The delay cell used is a complementary version of the  $g_m$ -cell proposed in [7]. In Fig. 3.11, the delay through one cell is given by  $\tau = \frac{g_m}{C_L}$  when  $g_{mn} = g_{mp} = g_m$ . The design provides for 2 bits of gain control and 3 bit of load capacitor,  $C_L$ , control (not shown for schematic clarity). The four delay taps include the downconversion mixer output and the outputs of 3 cascaded delay cells following the mixer. Measurements show a maximum delay of ~115ns cumulative delay (as shown in Fig 3.12).

The class-AB structure of the proposed delay cell enhances linearity while the overall delay cell is designed to optimize the noise vs power trade-off. The output noise of the complementary delay cell can be extrapolated from the analysis in [7] as



Figure 3.10: Passive mixer along with programmable capacitor bank used to tap from the TX for BB SIC and CT-SIC.



Figure 3.11: Complimentary delay cell architecture based on [7].

$$v_{n,out,d}^2 \approx \frac{8kT\gamma}{g_m} \left[ \frac{12g_m^2 + (C_L\omega)^2}{4g_m^2 + (C_L\omega)^2} \right] = \frac{8kT\gamma}{g_m} \left[ \frac{12 + (\tau\omega)^2}{4 + (\tau\omega)^2} \right]$$
(3.8)

For a given delay  $\tau$ , lower noise requires higher  $C_L$ , which in turn mandates a higher  $g_m$  to maintain the same  $\tau$ . Thus, delay cell noise directly trades with area and power. At the same time, the delay-cell noise contribution also depends on the tap weights for optimal TX

cancellation. Lower tap weights can be achieved with a large input swing for the copy of the TX signals driving the delay cells. The proposed complementary structure supports a larger input signal swing and therefore enables lower noise contribution from the baseband canceller, while also achieving higher current efficiency and hence lower power.



Figure 3.12: Measured delays across the 4 taps of the delay element.

#### 3.2.3.3 Programmable Vector Modulator

The implementation details of the  $g_m$ -cell vector modulators are shown in Fig.3.13. Each of the I and Q signals from different stages of the delay lines are connected to two 8-bit programmable differential  $g_m$ -cells, 1 bit for a sign and a 7-bit for amplitude control. Each programmable  $g_m$ -cell consists of a switch circuit to flip differential phases and 127 switchable identical inverter-based  $g_m$ -cells in parallel to vary the amplitude. The inverter-based  $g_m$ -cell proposed in [56] is employed which provides common-mode rejection and output self-biasing using the  $M_{PB}$  transistors. High/low threshold voltage (HVT/LVT) FETs are used for proper circuit biasing.  $M_{PB}$  and  $M_P$  devices are PMOS HVT and LVT devices respectively with W/L=800 nm/250 nm, and  $M_N$  is an HVT NMOS sized to W/L=400 nm/250 nm. Transistor widths are at the minimum width allowed by the technology to achieve maximum resolution. The output of all vector modulators are connected to the summation point,



Figure 3.13: BB canceller vector modulator.

namely the TIA input. This summation point is also connected to pads to enable the tiling of ICs on board to increase the MIMO order, with baseband CT-SIC being performed across ICs. In such a scenario, because of the package and board-level parasitics that would be seen at the summation point, implementing stable common-mode feedback in the g<sub>m</sub>-cells would be challenging, thus pointing to a secondary benefit of the output self-biasing.

Fig. 3.14 presents the baseband canceller IIP3 simulation in the setting that is used to measure Fig3.25 and Fig.3.26(a). IIP3 from the TX port to the output of each stage of the baseband canceller is better than +27 dBm, and the IM3 level is around 30 dB lower than the fundamental in the overall output current of the baseband canceller at the nominal TX input power level of +14 dBm.



Figure 3.14: Baseband canceller IIP3 simulations: (a) TX to the output of each stage of the delay line, and (b) TX to output current of the baseband canceller when the cancellation weight is programmed.

#### 3.2.4 Output Buffer

An inverter-based amplifier is used due to its high linearity at low supply voltages to buffer the output of the TIA to an off-chip IF balun (Fig.3.15). Common-mode feedback circuitry similar to [55] is used to ensure a proper output DC value. Buffer FETs are sized to provide a gain of 1 with 100  $\Omega$  differential load. The buffer needs to be loaded with 100  $\Omega$  all the time due to its high intrinsic gain to avoid instability. 2.5 pF compensation capacitors ( $C_c$ ) are used to ensure common-mode stability and large 7.8 k $\Omega$  resistors ( $R_L$ ) are used to extract the output DC voltage value.

#### 3.2.5 Clock Generation and Bootstrapping circuitry

In passive-mixer-like circuits such as N-path filters, depicted in Fig .3.16(a), when the gate is clocked with a pulse, large voltage swing at the BB nodes can cause the FET gate-source voltage to modulate, resulting in nonlinearity and limited power handling. Bootstrapping



Figure 3.15: Inverter-based output buffer with common-mode feedback.



Figure 3.16: Clock bootstrapping in N-path filters can improve the linearity by keeping the gate-source voltage constant. N-path filter and FET terminal voltages (a) without and (b) with clock bootstrapping.

can be employed to improve the linearity by adding the BB node voltage to the pulse driving the FET gate, as shown in Fig .3.16(b), thus making sure the gate-source voltage of the FET is constant during the ON time period [62].

Differential clock signals at  $4 \times f_{clk}$  are distributed on the chip between the two elements



Figure 3.17: Clock generation and bootstrapping circuitry.

symmetrically using a resistive divider with a 50  $\Omega$  interface. Fig. 3.17 shows the clock generation circuitry in each element. A Johnson-counter divider using the low power latch presented in [63] is used to divide the  $4 \times f_{clk}$  by 4 and generate 8-phases of  $f_{clk}$ . The pulse generation block generates the required 12.5% and 25% non-overlapping clock pulses for the 8-phase circ.-RX N-path filter and the 4-phase passive mixer, and reduces the power consumption by sharing the divider, its buffers and NOR gates. The generated pulses are retimed with the main  $4 \times f_{clk}$  using an AND gate and a DFF to mitigate the phase noise of the divider and its buffers [57].

Thanks to the low-speed nature of the BB signal, clock bootstrapping is performed by

simply using a capacitor in series with the 12.5% clock pulse applied to the N-path filter switch which is charged to the BB voltage value using a buffer operated from a 2.4V supply. Although this buffer can be power-hungry, it is shared between the 4 switches that are connected to each BB node of the N-path filter using a CNTL signal generated by the pulse generation circuitry. Thus, 8 buffers are required rather than 32, improving power efficiency. Our simulations shows that the bootstrapping circuitry degrades the NF by only a 0.5 dB.

### **3.3** Experimental Results

The chip microphotograph and the package of the 65 nm CMOS FD 2-element MIMO circ.-RX array are shown in Fig. 3.19. It has a total active area of 5.6 mm<sup>2</sup>, is packaged in an 80-pin, 12 mm×12 mm QFN package, and is mounted on an FR-4 PCB for all measurements. Package wirebond inductances are used for the  $3\lambda/4$  transmission lines in the circ.-RX. Therefore, there exists no off-package on-board external inductors. Mini-circuits TCW2-272 and ADT2-71T 50  $\Omega$ :100  $\Omega$  baluns are used to convert the single-ended TX/ANT signal to differential and the differential BB signals to single-ended, respectively. Based on our measurements, the wirebond inductance in the circulator is around 5 nH, and our simulations (Fig.3.18) show that 10% variation in wirebond inductance introduces only 0.15 dB additional TX-ANT loss. Initial isolation is i10dB over this variation range, which is in the range that can be compensated using feed capacitors.

#### 3.3.1 Circulator-Receiver Measurements

The measured single-element TX-to-ANT and ANT-to-BB response of the circ.-RX for a clock frequency of 2200 MHz are shown in Figs. 3.20(a) and (b). Note that the RX is not available as a separate RF port, and hence the circulator's ANT-to-RX and TX-to-RX performance cannot be measured directly using S-parameters. The circ.-RXs exhibit 3.7 dB TX-ANT loss. The single-element ANT-baseband response shows 30 dB conversion gain at a nominal setting, and supports more than 20 MHz bandwidth. The RX NF is 9.5 dB



Figure 3.18: Simulation results of the circ.-RX performance across 10% variation in wirebond inductance, where L1, L2 and L3 are the wirebond inductances in the CLC sections between the ANT-RX, TX-RX and TX-ANT ports, respectively: (a) TX-ANT response, and (b) TX-RX isolation.

(Fig. 3.20(c)). Note that in this work, the circ.-RX encompasses the RX and an on-chip shared-ANT interface with no off-chip on-board inductors. The high NF and TX-to-ANT loss are primarily a function of the high operating frequency for N-path-like circuits in 65 nm CMOS, as the cumulative parasitic capacitance on the RF side of the N-path filter introduces substantial loss due to harmonic conversion. The use of a more scaled CMOS technology would substantially lower the NF and TX-to-ANT loss. Fig. 3.21 presents ANT-baseband linearity measurements. Output-referred in-band 1 dB compression point is around -6 dBm and out-of-band IIP3 is better than +21 dBm at 200 MHz offset.



Figure 3.19: Chip microphotograph of the 65 nm CMOS FD 2-element MIMO circ.-RX in an 80-pin QFN package.

#### 3.3.2 Linearity Measurements

Linearity and power handling measurements are presented in Fig. 3.22. A two-tone test shows that circ.-RX IIP3 can be enhanced from +34 dBm to +43 dBm using clock bootstrapping, a +9 dB improvement. Furthermore, single-element TX power handling is evaluated in terms of the TX-induced RX compression point. The gain imparted to a weak in-band signal applied at the antenna port is monitored while the TX power swept. The 1 dB compression of the weak in-band signal occurs at +7 dBm TX power without clock bootstrapping, and is boosted to +14 dBm with clock bootstrapping, showing +7 dB improvement in power handling. These results prove clock bootstrapping to be an effective way to enhance N-path-filter power handling.

#### 3.3.3 **RF MIMO Passive Canceller Measurements**

The performance of the feed-capacitor-based RF SI/CT-SI cancellers are shown in Fig 3.23. RF SI capacitors, in addition to the capacitor banks at TX and ANT ports, can be used to cancel SI due to antenna reflection. Measurements are done on 1024 sample points showing



Figure 3.20: Single-element circ.-RX measurements: (a) TX-ANT response demonstrating nonreciprocity, (b) ANT-BB conversion gain, and (c) NF.

full coverage of antenna impedance inside the VSWR=2 circle, and partial coverage of VSWR <3. Similarly, cross-element RF passive CT-SI canceller measurements show cancellation of up to -18.5 dB of cross-antenna coupling with arbitrary phase. Note that RF passive cancellation consumes no power and can enhance FD power handling substantially and reduce the NF penalty of subsequent BB cancellers.



Figure 3.21: Single-element circ.-RX ANT-BB linearity measurements: (a) in-band 1dB compression point, (b) IIP3 versus offset frequency.



Figure 3.22: Linearity measurements with and without bootstrapping: (a) TX-ANT IIP3 and (b) TX-induced RX gain compression.

#### 3.3.4 Wireless Full-Duplex MIMO SIC and CT-SIC Measurements

FD MIMO wireless SIC and CT-SIC measurements are performed with slot loop antennas due to their inherent wideband operation. The antenna is fabricated on an FR-4 PCB and can radiate on both frontside and backside directions. A metal sheet is used as a reflector at the back of the antenna with a quarter wavelength distance to redirect the radiation to the frontside. EM simulations using Mentor Graphics IE3D show an antenna gain of 4.7 dBi.



Figure 3.23: Measured RF SIC/CT-SIC performance: (a) SIC antenna VSWR coverage and (b) CT-SIC antenna coupling coverage.

The setup for full-duplex measurements to characterize both SIC and CT-SIC performance is shown in Fig. 3.24. For SIC measurements, the TX port of one of the elements on the 2-element IC is excited by a PNA port. The circ-RX output (at baseband) drives an up-converter that enables two-port measurements at the same frequency in the PNA. CT-SIC characterization, on the other hand, requires exciting the second TX port using the PNA. This results in crosstalk signals radiated out of ANT2 coupling to ANT1. Comparing isolation with the feed-forward capacitors enabled and disabled yields the CT-SIC provided by the feedforward network.

Optimal weights for RF and BB SIC/CT-SIC are determined using a two-step approach. Initially, the BB SIC/CT-SIC are disabled and RF SIC/CT-SIC weights are determined,



Figure 3.24: FD MIMO wireless SIC and CT-SIC measurement setup.

assuming that the I and Q feedforward caps operate orthogonally. This is followed by a finer optimization step that searches through weights neighboring the initial settings. Following this, the BB SIC delay taps are individually characterized to determine the amplitude, phase and delay transfer functions. A subsequent optimization step then computes BB SIC/CT-SIC weights assuming phase-invariant gain weights in the VGA stages following the delay taps. Finally, a local search is carried out around the obtained weights to determine optimal BB SIC/CT-SIC weights, accounting for phase variation with gain settings in the VGA. The two-step optimization approach was compared to a one-shot approach where nonlinear optimization techniques were investigated that determined RF and BB SIC weights jointly. Given the impact of linearity and noise, RF SIC is critical in MIMO systems with reasonable power handling, and therefore, the two-step approach yielded better results. As a topic of future research, the optimization approach can be considerably simplified with a reduction in phase variation across gain in the vector modulators.



Figure 3.25: Wireless FD measurements: (a) single-element FD SIC measurement results, and (b) two-element FD CT-SIC measurement results.

Fig. 3.25(a) shows the measurement results corresponding to SIC provided by (i) circulator internal isolation (average  $\sim 15$  dB) (ii) RF SI cancellation using the feed-forward capacitors, and (iii) baseband SI cancellation. Over a BW of 40/20 MHz around 2.2 GHz, the average total SIC with RF SIC turned on was  $\sim 28/30$  dB. With both RF SIC and BB SIC, the average total was  $\sim 35/45$  dB.

Similarly, Fig. 3.25(b) shows the measurement results corresponding to the CT-SIC provided by (i) 30 dB average ANT-to-ANT isolation, (ii) RF CT-SI cancellation using the feed-forward capacitors and (iii) baseband SI cancellation. It can be seen that the peak CT-SIC provided with RF CT-SIC (along with inherent ANT isolation) was up to 60 dB. In addition, the total SIC (ANT-ANT isolation, RF CT-SIC, and BB CT-SIC combined) was on an average  $\sim 42/53$  dB over 40/20 MHz BW around 2.2 GHz carrier frequency.

The SIC and CT-SIC performance are achieved with 1.5 dB and <2.1 dB penalty in terms of NF, shown in Fig.3.26(a). It is also critical to measure the NF degradation in the presence of the TX signal. Measuring the TX-induced ANT-baseband NF using a regular Y-factor method is not feasible as the noise source ENR is sensitive to the TX power, and thus

the gain method is used. NF degradation in the presence of a continuous wave TX signal is presented in Fig.3.26(b). We find that at an average TX operation power of +8dBm (6dB backed off from the +14dBm TX-induced RX 1dB compression point to account for the PAPR of a modulated signal), the increase in NF over the NF in the absence of a TX signal is  $\sim$ 3 dB, while the increase at that same power level is  $\sim$ 14 dB without RF and BB cancellation. This 3 dB degradation is likely caused by a reciprocal mixing effect between the TX signal and the circulator LO phase noise, which requires more understanding and is a topic of future research.



Figure 3.26: (a) NF degradation due to SIC and CT-SIC, and (b) NF degradation in the presence of TX signal.

### 3.4 Summary

This work is compared to CMOS FD single-element and FD phased-array receivers in Table 3.1. This work is the first to integrate MIMO operation with FD using RF and shared-delay BB cancellers. Thanks to clock bootstrapping, this work achieves the highest TX power handling per element compared to the state-of-the-art. Although this work enables MIMO

|              |                                           | Columbia Uni.<br>JSSC 2017 [35]                                                      | Uni. of Washington<br>JSSC 2018 [36]                                                     | Columbia Uni.<br>JSSC 2018 [12]                                                               | Uni. of Washington<br>ISSCC 2018 [37]                                                            | Columbia Uni.<br>RFIC 2018 [51]                                                                           | This Work                                                                                                 |
|--------------|-------------------------------------------|--------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
| Architecture |                                           | RX with integrated<br>magnetic-free N-path-<br>filter-based circulator<br>and BB SIC | Full-duplex transceiver<br>with the dual path RF<br>and BB adaptive filter<br>cancellers | Magnetic-free N-path-<br>filter-based circulator-<br>receiver with on-chip<br>balance network | Full-duplex transceiver<br>employing an electrical<br>balance duplexer and<br>dual RF cancellers | Full-duplex phased-array<br>with integrated magnetic-<br>free N-path-filter-based<br>circulator-receivers | Full-duplex MIMO circulator-<br>RX with MIMO RF and<br>shared-delay BB self-<br>interference cancellation |
| metrics      | RX Frequency Range                        | 0.6-0.8GHz                                                                           | 1.7-2.2GHz                                                                               | 0.61-0.975GHz                                                                                 | 1.6-1.9GHz                                                                                       | 0.61-0.975GHz                                                                                             | 2.2GHz                                                                                                    |
|              | Number of Antenna<br>Paths                | 1                                                                                    | 1                                                                                        | 1                                                                                             | 1                                                                                                | 4 per IC, 2 ICs tiled for FD<br>measurements                                                              | 2 MIMO elements per IC<br>(scalable)                                                                      |
| RXI          | Nominal Gain                              | 42dB                                                                                 | 20~36dB                                                                                  | 28dB                                                                                          | 42dB                                                                                             | 41dB (single-element)                                                                                     | 30dB (single-element)                                                                                     |
| E.           | Noise Figure                              | 8.4dB                                                                                | 4dB                                                                                      | 6.3dB                                                                                         | 8.1dB                                                                                            | 5dB (single-element)                                                                                      | 9.5dB (single-element)                                                                                    |
| FD Metrics   | Integrated SI<br>Suppression Domains      | RF + Analog BB                                                                       | RF + Analog BB                                                                           | RF                                                                                            | RF                                                                                               | RF + Spatial                                                                                              | RF + Analog BB                                                                                            |
|              | Amount of Integrated<br>SI Suppression    | 42dB SIC across<br>12MHz BW<br>(with 50Ω term.)                                      | 50dB SIC across<br>42MHz BW<br>(with 50Ω term.)                                          | 40dB SIC across<br>20MHz BW<br>(with 50Ω term.)                                               | 72.8/70.1/65.2dB SIC<br>across 20/40/80MHz<br>BW (with 50Ω term.)                                | 50dB array SIC across<br>16.25MHz BW<br>(across 8 elements)<br>(with real antenna)                        | 45dB SIC across 20MHz<br>BW<br>(with real antenna)                                                        |
|              | Amount of Integrated<br>CT-SI Suppression | N/A                                                                                  | N/A                                                                                      | N/A                                                                                           | N/A                                                                                              | N/A                                                                                                       | 53dB CT-SIC across 20MHz<br>BW (with real antenna)                                                        |
|              | Overall TX Port Power<br>Handling         | -7dBm                                                                                | +25dBm (prior to 30-<br>35dB off-chip isolation)                                         | +7dBm <sup>2</sup>                                                                            | +5dBm <sup>4</sup>                                                                               | +2dBm <sup>2</sup><br>(single-element)                                                                    | +14dBm <sup>2</sup><br>(single-element)                                                                   |
|              | RX Degradation in<br>Full-Duplex Mode     | 2.5dB                                                                                | 1.5dB                                                                                    | 1.7dB                                                                                         | 1.6dB                                                                                            | 3dB (RX) + 3.5dB (TX)<br>(Array Gain Degradation)                                                         | 1.7 (SIC)<br>2.1dB (SIC+CT-SIC)                                                                           |
| Resources    | RX Power                                  | 100mW<br>(60mW signal path +<br>10mW LO + 30mW<br>BB canceller)                      | 22mW                                                                                     | 72mW                                                                                          | 106mW                                                                                            | 8mW / element<br>(32mW for 4-element on<br>an IC)                                                         | 46mW / element<br>(21mW BB + 23mW delay<br>cells+2mW canc. g <sub>m</sub> -cells)                         |
|              | Antenna Interface<br>Power                | 59mW (at 0.7GHz)                                                                     | N/A                                                                                      | 36mW (at 0.7GHz)                                                                              | N/A                                                                                              | 26.25mW per element                                                                                       | 156mW per element                                                                                         |
| 2            | Technology                                | 65nm CMOS                                                                            | 40nm CMOS                                                                                | 65nm CMOS                                                                                     | 40nm CMOS                                                                                        | 65nm CMOS                                                                                                 | 65nm CMOS                                                                                                 |
|              | Active Area                               | 1.4mm <sup>2</sup>                                                                   | 3.5mm <sup>2</sup>                                                                       | 0.94mm <sup>2</sup>                                                                           | 4mm <sup>2</sup>                                                                                 | 3.6mm <sup>2</sup> (0.9mm <sup>2</sup> /<br>element)                                                      | 5.6mm <sup>2</sup> (2.8mm <sup>2</sup> / element)                                                         |

Table 3.1: Comparison of the proposed FD MIMO circ.-RX with state-of-the-art FD RXs with an integrated shared antenna interface.

1. From Fig. 31(a) in the paper. 2. Limited by ~1dB gain compression induced in the receive signal. 3. at -17.3dBm TX power. 4. SIC under TX Power.

N/A: Not Applicable, N/R: Not Reported

operation, the NF is low thanks to the initial RF SIC and CT-SIC. Also notable is the fact that the BB canceller power consumption is less or comparable to prior single-element works due to the sharing of the delay cells. The high NF and antenna interface power consumption are because 2.2 GHz is a high operation frequency for N-path-filter circuits in 65 nm CMOS technology. These can be substantially decreased by using a scaled CMOS technology.

This chapter presented an FD 2-element MIMO circ.-RX array which exploits passive RF SIC and CT-SIC as well a shared-delay baseband cancellation to address the complexity of MIMO FD SIC. Clock bootstrapping is also employed to boost the power handling of the N-path-filter-based circ.-RX. Future research directions include techniques to scale to massive MIMO and canceller implementations that ease SIC configuration algorithms.

### Chapter 4

# A 60 GHz 4-Element MIMO TX with a Single-Wire Interface

Mm-wave transceivers can transmit and receive GBs of data thanks to the enormous spectrum available at high frequencies. The WiGig, also known as 60 GHz WiFi, includes IEEE 802.11a/d standards and supports the bonding of 2.16 GHz channels occupying spectrum 57.24 GHz to 70.2 GHz, constellation up to 64QAM, and spatial MIMO [64]. Mm-wave systems with per-element digitization enable virtual arrays for radar, digital beamforming (DBF) for high mobility scenarios, simultaneous multi-beam formation, spatial multiplexing and per-PA digital pre-distortion. Per-element digitization, however, poses the challenge of handling high data-rate I/O in large-scale tiled MIMO mm-wave arrays. Although SERDES is traditionally being used as a high-speed link in computing systems and networks, it is an area- and power-hungry solution, and a significant overhead for handheld devices. Alternatively, multiple-element IF data can co-exist on a single-wire interface (SWI) using codedomain multiple access (CDMA) and frequency-domain multiple access (FDMA) concepts (Fig. 4.1). Code-domain multiplexing similar to [6] is not a viable solution for WiGig applications as that results in 34.56 GHz spread data bandwidth which requires power-hungry ADCs and digital processing, and thus this works deploys FDMA approach to receive four-element IF data on a SWI.

This chapter demonstrates a 60 GHz 45 nm RF-SOI 4-element scalable MIMO TX with a SWI to alleviate the challenge of supporting high data-rate I/O in a large-scale tiled MIMO mm-wave array. FDMA is used on the single-wire to simultaneously support the signals of all 4 MIMO channels while breaking the trade-off between channel-to-channel isolation and single-wire bandwidth. Harmonic-rejection mixing (HRM) is used to demultiplex the 4 modulated signals simultaneously from the single-wire, each with 2 GHz bandwidth (total BW of 8 GHz). A novel two-stage wideband HRM achieves high channel-to-channel isolation with low power overhead. The SWI can support 8 GHz total IF bandwidth across the 4 channels with 30-40 dB SFDR. Each TX in the array achieves 20-35 dB conversion gain and 8.8-10.9 dBm OP1dB while maintaining a channel-to-channel isolation > 30 dB. System level measurements show the ability of the MIMO chip to form multiple simultaneous independent beams carrying independent signals.

### 4.1 Single-Wire Interface

This section discusses the challenges associated with implementing a single-wire interface. Then, the signal flow of the IF data and 30 GHz reference signal from the input interface are described to clarify the proposed solution. Finally, the harmonic rejection mixing concept is described.

#### 4.1.1 Challenge

In a large-scale mm-wave array, to ease the scalability, a SWI is desired to connect each mmwave RF front-end chipset to the DSP unit, and thus, all elements IF data information need to co-exist on a SWI. As mentioned in the chapter.1, SERDES is a traditional solution in which first each element received signal is digitized using ADCs with proper dynamic range and then all elements data are serialized into a single-wire and transmitted to the DSP unit IC. Finally, data is deserialized at the DSP unit and digital beamforming can be done to extract spatial data. Each WiGig band is 2.16 GHz and after downconversion, it translates to 1.08 GHz IF

CHAPTER 4. A 60 GHZ 4-ELEMENT MIMO TX WITH A SINGLE-WIRE INTERFACE



Figure 4.1: A scalable MIMO TX architecture with a single-wire interface based on CDMA and FDMA.

signal BW on I and Q. Sampling at Nyquist-rate using 8-bit ADC results in 138.24 Gb/s data for 4-element (2160*MSample/sec* × 8(*bit/sample*) × 2(I+Q) × 4(*elements*)). SERDES circuits are challenged with high-speed operation, intensive equalization, and robustness [65], and thus equipped with power-hungry buffers and equalizers. In [65], a SERDES is presented in 28 nm CMOS that works between 1.25 Gb/s to 28.5 Gb/s while consuming 170-560 mW of power from 1.5 V / 1.05 V / 0.85 V supplies with 0.83 mm<sup>2</sup>. In [66], a non-return-zero (NRZ) 58.4 Gb/s to 61.2 Gb/s SERDES transmitter is presented in 65 nm digital CMOS which consumes 450 mW with die area of 2.1 mm<sup>2</sup>, showing that SERDES circuit is an area-

and power-hungry solution not suitable for forming a SWI.

To avoid the extra power consumption and silicon area associated with SERDES circuits for the SWI, each-element IF data can be transported on a single-wire using FDMA or CDMA (Fig. 4.1). For instance, in [6] spreading factor of 16 using Walsh function is used to spread 4-element 100 MHz I&Q IF bandwidth to 1.6 GHz and transmit on a SWI using CDMA. However, the CDMA approach high spreading rate is not suitable for WiGig applications with 2.16 GHz of IF data bandwidth. In this dissertation, four streams of IF data for the 4-element MIMO TX are multiplexed on a SWI using the FDMA approach eased by HRM circuitry.



Figure 4.2: Single-wire interface signal flow breakdown.

#### 4.1.2 Signal Flow

Fig. 4.2, show the proposed signal flow structure that enables the SWI. The IF data signals for 4-element TX are placed at the first, third, fifth and seventh harmonic of 1.25 GHz which occupies the spectrum below 10 GHz in addition to a 30 GHz reference signal that can be used for clock and LO generation. At the interface input, a duplexer is used to separate the

IF data signals (0–10 GHz) from 30 GHz reference which is simply implemented by a parallel combination of an LPF and an HPF. The 30 GHz reference is divided into two-path using a 30 GHz Wilkinson splitter for clock generation and doubler. A divide-by-8 in subsequent by a divide-by-3 is used to generate 16-phases of required 1.25 GHz clock for the HRM from the 30 GHz reference. The HRM downconvert each element IF data to the zero-IF frequency using a different combination of 16-phase of 1.25 GHz clock. A doubler is employed which generates the 60 GHz LO required by 60 GHz upconversion mixer. Wilkinson dividers and amplifiers are used to divide and route 60 GHz LO from doubler to all four elements. Thus, all four elements IF data and 60 GHz LO are extracted from the SWI and available for upconversion and amplification.

#### 4.1.3 Harmonic Rejection Mixer

Harmonic rejection mixing concept were first introduced in broadband radios such as softwaredefined radios and TV tuners where the signal path must accommodate a wide frequency span, challenged from LO harmonics spurious mixing [8,67]. Fig. 4.3 demonstrates an Nphase HRM with BB gain coefficients. These coefficients are scaled samples of a sinusoid of frequency  $f_{LO}$  that is sampled in N times per period, with time intervals  $1/(N.f_{LO})$ , and samples clock of frequency  $N.f_{LO}$ . we thus have:

$$a_k = \sin(\frac{2\pi k}{N}) \tag{4.1}$$

where  $k \in [1 : N]$ . An effective downconversion frequency of  $n.f_{LO}$ , where  $n \in [1 : N/2]$  can be synthesized by applying coefficient  $a_k$  to the input in a specific time-sequence while rejecting all other harmonics. This approach is very similar to the direct digital frequency synthesize operation but in analog using a limited number of path gains within HRM. In practice, the gain coefficients are quantized versions of those in Eq. 4.1 which along with gain and phase mismatch limit achievable harmonic rejection (HR).

The HRM employs principles of a two-stage harmonic rejection mixing concept (Fig. 4.4) which makes the mixer insensitive to gain mismatch since effective gain mismatch is the



Figure 4.3: HRM using 1/N duty cycle clocks and baseband gain coefficients [8].

multiplication of the percentage mismatch present in each harmonic rejection stage [68]. Two-stage of HRM, however, increase the circuit complexity and thus, area and power consumption.



Figure 4.4: HR is limited due to gain quantization and mismatch. Multi-stage HRM reduces the degradation in HR due to gain and phase mismatch.

### 4.2 Circuit Implementation

Fig. 4.5 presents block and circuit diagrams of a 45 nm RFSOI CMOS 60 GHz 4-element MIMO TX with an FDMA based IF data and LO reference single-wire interface. As mentioned earlier, a duplexer made of a parallel combination of an LPF and an HPF is deployed

to separate the 0-to-10 GHz IF data from the 30 GHz LO reference. The 30 GHz LO reference is divided by 24 to generate 16-phase of 1.25 GHz clock required by HRM which downconvert each channel into zero-IF using 16-phase of mixing product. Cascading two-stage of HRM enhances the inter-channel leakage to <40 dB in all channels. Moreover, the 60 GHz LO is generated by passing the 30 GHz LO reference through a doubler circuit. A 90° hybrid coupler is used to generate the 60 GHz LOs with 0° and 90° phases for performing quadrature upconversion mixing. A 1 GHz, 5th-order elliptic filter is deployed to suppress out-of-band higher-order harmonics at the output of the HRM which results in a clean output mask.



Duplexer HRM Clock Phase Gen 60 GHz Routing Color-coded 8 Channels (BBVVGGRR)

Figure 4.5: Block diagram of a 45 nm RFSOI 60 GHz 4-element MIMO TX with a frequencydivision-multiplexed 10 GHz single-wire interface.

#### 4.2.1 Duplexer

The duplexer circuit detail is presented in Fig. 4.6, which includes the parallel combination of an HPF and an LPF to divide DC-to-10 GHz IF data and 30 GHz LO signal from the single wire input. The LPF is a passive synthesized 5th-order low-pass elliptic filter. Although the pass-band is chosen to be 15 GHz to make sure the filter roll-off does not

degrade the IF signals BW (DC-to-10 GHz), it has around 40 dB out-of-band rejection to sufficiently suppress 30 GHz reference (see S21 in Fig. 4.7). Similarly, the HPF is also a passive synthesized 5th-order high-pass elliptic filter which can reject IF signal between DC-to-10 GHz by >40 dB with minimum loss at 30 GHz (see S31 in Fig. 4.7).



Figure 4.6: Circuit Implementation of the duplexer that includes high pass and low pass filters to divide 0-10 GHz IF data and 30 GHz LO signal from the single wire input.



Figure 4.7: S-parameter simulation of the duplexer which divides IF signals and 30 GHz reference with around 40 dB isolation.

### 4.2.2 Two-Stage Harmonic Rejection Mixer

Fig. 4.8 shows the circuit diagram of the two-stage HRM. A simple inverter is used at the input as  $g_m$  to amplify the input signal. A current-mode passive mixer samples input IF signals that are lying at 1st, 3rd, 5th and 7th harmonics of 1.25GHz with 16-phases. Then, sampled signals are buffered using an inverter-based TIA introduced in [56] which does not require common-mode feedback, and provide common rejection. The resistive networks are used to combine different samples by sequenced calculated in Eq. 4.1 which is  $[0,\sqrt{2-\sqrt{2}}/2,\sqrt{2}/2,\sqrt{2}+\sqrt{2}/2,1]$ . These ratios are approximated by a ratio of integer numbers as [0,4,7,9,10] at the first stage and [0,5,9,12,13] at the second stage of HRM.



Figure 4.8: Circuit Implementation of the 2-stage harmonic rejection mixer with minimum area and power consumption.

Post-layout inter-channel HR simulation are shown in Fig. 4.9 where the response from all channels from input to all HRM channel outputs are simulated using PSS+PAC. At IF frequencies, the inter-channel HR >40 dB between all channel pairs can be achieved.

#### 4.2.3 Low-Pass Filter

The LPF is a passive synthesized 5th-order low-pass elliptic filter. To minimize the LPF area consumption, inductors are implemented in a differentially where decreases the inductor

CHAPTER 4. A 60 GHZ 4-ELEMENT MIMO TX WITH A SINGLE-WIRE INTERFACE



Figure 4.9: Post-layout inter-channel HR simulation shows HR better than >40 dB between all channels.

area size by a factor of  $2 \times (1+k)$  where the factor 2 is due to the fact that side of differential inductors are implemented on top of each other, and (1+k) takes into account the mutual inductance between positive and negative side of differential inductor. Fig. 4.7(b) shows simulation results of the designed LPF where the pass-band corner is around 1 GHz, and suppresses the harmonics at >1.5 GHz (adjacent channel) by >40 dB.



Figure 4.10: The 5th order differential elliptic low pass filter using passive components.



Figure 4.11: Doubly-balanced Gilbert cell mixer used as upconversion mixer with BB amplifier (a) circuit diagram and (b) simulation results.

### 4.2.4 Upcoversion Gilbert-Cell Mixer

A double-balanced Gilbert-cell is used as an upconversion mixer to upconvert each-element IF data to 60 GHz (Fig. 4.11(a)). A balun is employed to convert the single-ended LO signal to the differential one. A varactor is used at the output of the balun providing up to 15° programmable phase shift which can be used to enhance I and Q mixer image rejection ratio.

Furthermore, a programmable IF amplifier is used to provide initial gain to suppress mixer lossy frequency conversion. Fig. 4.11(b) presents the designed mixer simulation results which around 7 dB conversion gain.



Figure 4.12: Two-stage stacked class-E PA (a) circuit diagram, and (b) simulation results.

#### 4.2.5 **Pre-Driver and Power Amplifier**

Fig.4.12(a) depicts the two-stage class-E-like PA implemented by stacking two  $78 \times 1.25 \ \mu m/40$  nm floating-body devices to increase voltage swing at the load. Device sizes, supply, bias voltages, and gate capacitor values are selected based on the theoretical analysis and considerations described in [69,70]. A multiplicity-based device layout is used to keep a good balance between  $f_{max}$  and  $f_T$ . The PA achieves a simulated saturated output power of +14 dBm with 22% PAE at 60 GHz, shown in Fig. 4.12(b).

### 4.3 Measurement Results

Fig. 4.13 shows the chip microphotograph of the 45nm RF-SOI 4-element MIMO TX. The chip measures  $3.2 \text{ mm} \times 4.9 \text{ mm}$ , and is flip-chip mounted on a Rogers-4350B PCB through 75  $\mu$ m C4 solder bumps, which in turn is mounted on another FR4 PCB for power supply connections. The PCB RF trace losses are de-embedded, but the measurements include the 1-2 dB loss expected from the C4-PCB solder ball interface which is not de-embedded in the results in all measurements.



Figure 4.13: Chip microphotograph of the 45 nm CMOS RF-SOI 60 GHz 4-element MIMO TX.

Fig. 4.14 shows the measured conversion gain to each TX output and channel-to-channel isolation for three sample chips across frequency. The gain is more than 30 dB for the first two channels and more than 20 dB for the next two. This gain variation arises from several factors, including inherent gain differences in the downconversion of different harmonics, gain roll-off from DC to 10 GHz in the harmonic rejection mixer, and some systematic routing differences on the chip. However, the access to per-element IF streams would enable the compensation of this gain difference easily in DSP. The conversion gain BW of the channels is around 1.2 GHz limited by the mm-wave part. However, the 3 dB BW of the HRM de-

multiplexer itself is higher than 2 GHz. The worst-case measured isolation (the difference between the lowest gain for the main channel and highest gain for spurious channels) across 3 different chips is annotated on the figure. This isolation between channel 1 to 2/3/4 is better than 40/45/40 dB, channel 2 to 1/3/4 is better than 40/40/40 dB, channel 3 to 1/2/4 is better than 45/35/35 dB, and channel 4 to channels 1, 2, and 3 is better than 30/40/30 dB respectively. These channel-to-channel isolations imply that the net spurious leakage to the desired channel from the other channels is less than 30 dB, enabling complex modulation formats such as 64-QAM OFDM.





The measured large-signal characteristics for each MIMO TX output are shown in Fig.

4.15. The OP1dB ranges from 8.8-10.93 dBm, while the saturated output power ranges from 9.1-12.5 dBm. Drain efficiency of the whole chip as a function of input power is also measured and can be seen in Fig. 4.15. The DC power consumption of the whole chip is 220 mW/channel at 6 dBm output power for each channel.



Figure 4.15: Measured gain, output power and drain efficiency as a function of input power for all channels for three different chips. The inset in each figure shows the Psat, OP1dB and drain efficiency at OP1dB for the 3rd sample.

Fig. 4.16 shows the measurement setup that is used to demonstrate simultaneous multibeam formation capability. Utilizing four PLL boards, four tones are generated at 1.2 GHz, 1.35 GHz, 3.7 GHz and 3.85 GHz. These signals along with a 30 GHz LO are all combined and fed to the chip using a single coaxial cable. Elements 1 and 2 outputs are connected to a 2-element aperture-coupled patch antenna array. The 1.2 GHz and 1.35 GHz tones fall

within the bandwidth of the first channel (0-2.5 GHz) and are up-converted to 60.05 GHz and 59.90 GHz respectively. The 3.7 GHz and 3.85 GHz signals fall within the bandwidth of the second channel (2.5 GHz - 5 GHz) and are also up-converted to 60.05 GHz and 59.90 GHz respectively. Applying different phase shifts to different input signals enables us to steer the 59.90 GHz and 60.05 GHz beams simultaneously in two different directions. The measured radiation patterns are shown in 4.16 for two different tones.



Figure 4.16: Measurement setup to demonstrate simultaneous formation of multiple beams carrying independent signals: antenna pattern measured for two simultaneously-transmitted frequencies, 59.9GHz and 60.05GHz.

### 4.4 Summary

This work is compared to the state-of-the-art mm-wave transceivers with and without singlewire interface [5,6,64,71] in table 4.1. This is the first work to support a single-wire interface for a mm-wave MIMO array with high aggregate BW and high channel-to-channel isolation.

This chapter presented a 60 GHz 4-element MIMO TX which established a single-wire interface between DSP unit and mm-wave unit through FDMA based single-wire interface. This method achieves the wide IF bandwidth requirement for the 60 GHz WiGig application, and enables an mm-wave radio link using DBF which is essential for application with high mobility scenarios. Finally, measurement results of the prototype in 45 nm RF-SOI CMOS were presented which demonstrates superior performance over state-of-the-art

Table 4.1: Comparison with state-of-the-art mm-wave transceivers with and without singlewire interface.

|                                                                                                                                                  | Broadcom<br>JSSC 2014 [5]                   | Intel<br>ISSCC 2018 [71] | Intel<br>ISSCC 2019 [64]                   | Oregon State Uni.<br>RFIC 2019 [6] | This Work                      |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------|--------------------------|--------------------------------------------|------------------------------------|--------------------------------|--|
| Architecture                                                                                                                                     | Phased-Array Tx/Rx                          | 2-Way Pol. MIMO<br>Tx/Rx | Digital Tx with 2-Way<br>Pol. MIMO         | 4-element MIMO RX                  | 4-element MIMO TX              |  |
| Single-Wire<br>Interface                                                                                                                         | Yes (IF data,<br>Control, LO ref and<br>DC) | No                       | No                                         | CDMA based<br>(IF data and LO)     | FDMA based<br>(IF data and LO) |  |
| Operation Freq.                                                                                                                                  | 60 GHz                                      | 60 GHz                   | 60 GHz                                     | 28 GHz                             | 60 GHz                         |  |
| # of Elements                                                                                                                                    | 16 (Tx), 16 (Rx)                            | 1 (dual polarization)    | 1 (dual polarization)                      | 4                                  | 4                              |  |
| MIMO Streams                                                                                                                                     | 1                                           | 2                        | 2                                          | 4                                  | 4                              |  |
| IF Data BW                                                                                                                                       | 2 GHz                                       | 3.47 <sup>1</sup> GHz    | 3.52 <sup>3</sup> GHz                      | 400 MHz                            | 8 GHz                          |  |
| Gain (dB)                                                                                                                                        | N/R                                         | N/R                      | N/R                                        | 16 (Rx)                            | 20 – 35                        |  |
| OP <sub>1dB</sub> (dBm)                                                                                                                          | +5.2 <sup>2</sup>                           | N/R                      | N/R                                        | N/A                                | 8.8 – 10.9                     |  |
| OP <sub>sat</sub> (dBm)                                                                                                                          | 92                                          | 4                        | 11.5 <sup>3</sup>                          | N/A                                | 9.1 – 12.5                     |  |
| Channel-to-<br>Channel Isolation                                                                                                                 |                                             |                          | N/A                                        | 20dB                               | 30-40dB                        |  |
| P <sub>DC</sub>                                                                                                                                  | 74.4 mW/element                             | 210 mW/element           | 182 mW/element                             | 73 mW/element (Rx)                 | 220 mW/element                 |  |
| Technology                                                                                                                                       | Technology 40nm CMOS                        |                          | 28nm CMOS                                  | 65nm CMOS                          | 45nm CMOS-SOI                  |  |
| Active Area                                                                                                                                      | Active Area 1 mm <sup>2</sup> /element      |                          | 3.24 <sup>4</sup> mm <sup>2</sup> /element | 1.44 mm <sup>2</sup> /element      | 3.92 mm <sup>2</sup> /element  |  |
| 1.From dividing data rate by symbol rate in Fig. 9.5. 6 2.For only PA 3.From 3.52Gsym/s in Fig. 9.6. 6 4.From Poutmax in Fig. 9.6.6 in the paper |                                             |                          |                                            |                                    |                                |  |

N/A: Not Applicable, N/R: Not Reported

mm-wave transceivers

### Chapter 5

### Conclusion

Multi-antenna radios, namely phased array and MIMO wireless have recently attracted unprecedented attention as it offers a significant increase in the channel capacity without additional BW or TX power due to its diversity/capacity operation [72]. It is well known that using the phased array operation can cause a remarkable SNR gain. The MIMO advantage over the phased array, however, is capacity gain due to the multiple independent Rayleigh fading channels (IRFC) constructed by multi-pair of transmitting and receiving antennas in the multi-path environment. Smart antennas, similar to phased arrays, were the core technique of 3G mobile communication [73], and the MIMO systems are the key technology that enabled the high throughput in 4G and long term evolution (LTE) wireless systems.

The massive MIMO wireless is expected to emerge to satisfy the enormous data throughput requirement of the next generation of wireless communications (5G). The mMIMO has benefits such as extensive use of inexpensive low-power components, reduced latency, simplification of the MAC layer, and robustness against intentional jamming. With mMIMO, expensive ultra-linear 50 W amplifiers used in conventional systems are replaced by hundreds of low-cost amplifiers with output power in the mW range [15], which is a great opportunity for use of cost-efficient silicon-based radios that are limited in power handling. However, mMIMO is faced with fundamental challenges in the term of processing resources. The number of channel responses each terminal must estimate is proportional to the number of base station antennas, and thus the uplink resources needed to inform the BS of the channel responses would be substantially larger than in the conventional systems.

Since the invention of the radio, transmission and reception have been separated in time or frequency to protect the RX from jamming by the SI due to the TX-RX leakage. FD wireless, an emerging wireless communication paradigm, allows simultaneous transmission and reception at the same frequency, promising a significant enhancement in spectral efficiency at the physical layer. Although FD can potentially double the data throughput, it is challenged by a large amount of SI from TX to its RX. Many SISO FD radios are proposed in which SIC is achieved through the magnetic-free integrated circulators as the shared antenna interface, reciprocal electrical-balance duplexers, polarization-based antenna interfaces, active antenna duplexers, and analog RF/baseband canceller. Besides, to form an FD link, digital cancellers are also required in addition to analog ones to meet stringent SIC requirements.

In this dissertation, we demonstrated how FD feature can be added to the phased array and the MIMO operation which can substantially improve the wireless spectral efficiency, as it can double the total link data throughput while benefiting from the phased array/MIMO diversity/capacity gain. Although the combination of these two technologies can result in higher data rate and spectrum efficiency, it is quite challenging as it has to suppress CT-SI in addition to the SI.

As a first step, we presented a scalable 4-element FD phased array circ.-RX which combines the functionality of phased array with FD radio. Join optimization of beamforming degrees of freedom on both the TX and RX sides is deployed for beamforming plus SIC with a minimum penalty on array gains. The beamforming is implemented with a minimum overhead by exploiting the available baseband outputs of N-path filter based integrated circ.-RX. A 730 MHz scalable 4-element FD phased array circ.-RX prototype is presented in 65 nm CMOS which achieves up to 50 dB of array SIC over +16.25 MHz with less than 3.5/3 dB degradation in TX and RX array gain across 8-element while handling +16.5 dBm TX array power. In conjunction with digital SIC, 100 dB of array SIC suppression is presented. Next, we present how FD operation can be integrated with MIMO operation using RF and BB SI/CT-SI canceller. MIMO operation increases the SIC requirements by a factor of  $N^2$ , resulting in severe area, power consumption, and noise penalty. We present the detailed analysis, design, and implementation of RF/BB cancellers that address this challenge, as well as a bootstrapping technique to enhance the power handling of integrated circ.-RX. A 65 nm CMOS scalable 2-element MIMO FD circ.-RX array with integrated N-path-filter-based nonmagnetic circulators is described that exploits bootstrapping in the circulator N-path filters to enhance TX power handling by 8 dB, and features area- and power-efficient passive RF and active BB wideband MIMO cancellation with shared-delay elements to address the  $O(N^2)$ cancellation challenge. The prototype exhibits up to 35/45 dB average SIC across 40/20 MHz BW, more than 42/53 dB average CT-SIC across 40/20 MHz BW with <2.1 dB degradation in RX NF, and overall TX power handling of +14 dBm enabled by clock bootstrapping.

Although phased array and MIMO wireless have remarkably enhanced the data capacity, the bottleneck in achieving the desired multi-Gbps data rate mostly lies in current crammed sub-6 GHz spectrum BW [3]. The tremendous available spectrum available at mm-wave frequencies is attractive for short-range indoor applications (60 GHz WiGig) as well as outdoor 5G cellular (28 GHz). Furthermore, studies have shown that the relative power consumption of wireless devices improves as the RF BW increases, thus motivating the acceleration of multi-Gbps smartphone devices at mm-wave frequencies where spectrum is widely available [74]. Although non-line-of-sight (NLOS) links results in much higher path loss at mm-wave frequencies, typically ranging from 15 to 40 dB greater than line-of-sight (LOS) paths, many NLOS paths can be formed using steerable high directivity antennas (phased array) [75, 76] that can be used to increase the coverage. Advances in the semiconductor industry allow for commercially available low-cost mm-wave CMOS radios [77, 78].

Several mm-wave transceivers arrays are presented in literature that deploy analog beamformer to perform the phased array operation [5, 64, 71, 79–82]. Digital beamforming offers adaptive beamforming in the presence of jammers that dynamically change position or frequency, can be used to form multiple simultaneous beams and is critical for user discovery and tracking in high mobility scenarios. Per-element digitization, however, results in a significant I/O challenge. Typically separate ICs as the mm-wave RF radio unit and the DSP are used in practical implementation, and thus a single-wire interface is desirable for connection between the mm-wave IC and the DSP unit to ease scalability of the design. We have shown that the traditional solution for a single-wire interface – SERDES is area- and power-hungry. Alternatively, multiple IF data can co-exist on a single-wire using CDMA and FDMA approaches which we found the FDMA approach a much more efficient solution for radios with large IF BW, such as 60 GHz WiGig, rather than CDMA approach.

We presented a 60 GHz 4-element MIMO TX with a single-wire interface in 45 nm RFSOI CMOS which can receive total IF BW of 8 GHz which are frequency multiplex on <10 GHz spectrum. Each element IF signal from the single-wire interface is downconverted to zero-IF frequencies using harmonic rejection mixing circuitry with >30 dB inter-channel isolation. A Gilbert-cell mixer is used to upconvert all elements IF data to 60 GHz to form a 4-element MIMO TX, and the signal power level on each element is amplified up to +12.5 dBm using a two-stage stacked PA.

In summary, multi-antenna radios are emerging in BS, AP as well as space-constraint femtocells and UE, and thus compact power-efficient multi-antenna radios will dominate the future market. In this dissertation, first, we proposed high-performance multi-antenna wireless by adding FD feature to phased array/MIMO technologies. In FD phased array, the array SIC is achieved through repurposing beamforming DoF, and without any explicit RF/BB canceller. Thus, achieving area- and power-efficient FD phased array with a minimum penalty in the term of array gain. Next, we showed a 2-element FD MIMO circ.-RX which combines FD operation with MIMO using RF and BB shared-delay canceller. RF SI/CT-SI cancellers provide initial SIC/CT-SIC which minimizes the NF penalty due to noisy BB delay-cells. Finally, we introduced a 60 GHz 4-element MIMO TX array that addresses the I/O challenge for MIMO arrays at mm-wave frequencies using an FDMA-based single-wire interface for IF data and LO reference.

As a future research direction, we are further studying FD phased array with joint TX

and RX beamforming to achieve improved FD data rates by formulating the corresponding optimization problem and developing an iterative algorithm to obtain an approximate solution with provable performance guarantees. Our initial studies presented in [52] show that an FD phased array with 9/36/72 elements can cancell the total SI power to below the noise floor with sum TX and RX array gain losses of 10.6/7.2/6.9 dB, even at TX power level of 30 dBm. Moreover, the corresponding FD rate gains are at least  $1.33/1.66/1.68 \times$ .

We have used sequential optimization to find the optimum weights for RF and BB canceller in the proposed FD MIMO by first maximizing RF SIC and then total SIC with the addition of BB canceller. In practice, this method might not be the optimum solution and co-optimization of RF plus BB canceller weights can achieve better SIC profile with less NF penalty. Finding an optimum optimization method and algorithm to program RF and BB canceller weights based on SI and CT-SI channel is essential for the practical use of FD MIMO. Furthermore, digital canceller is required in addition to analog one to establish an FD link. Digital cancellers performance depends heavily on how accurate is the SI channel estimation is, and new algorithms are required for the FD MIMO radio in which both SI and CT-SI should be suppressed down to the noise floor.

## Part I

# Bibliography

## Bibliography

- N. Hassan and X. Fernando, "Massive MIMO Wireless Networks: An Overview," *Electronics*, vol. 6, no. 3, 2017. [Online]. Available: https://www.mdpi.com/ 2079-9292/6/3/63
- [2] A. H. Naqvi and S. Lim, "Review of Recent Phased Arrays for Millimeter-Wave Wireless Communication," Sensors, vol. 18, no. 10, 2018. [Online]. Available: https://www.mdpi.com/1424-8220/18/10/3194
- [3] W. Hong *et al.*, "Study and Prototyping of Practically Large-Scale mmWave Antenna Systems for 5G Cellular Devices," *IEEE Communications Magazine*, vol. 52, no. 9, pp. 63–69, Sep. 2014.
- [4] J. Zhou, "Integrated Self-Interference Cancellation for Full-Duplex and Frequency-Division Duplexing Wireless Communication Systems," Ph.D. dissertation, Graduate School of Arts and Sciences, Columbia University, New York, 2017, available: https://doi.org/10.7916/D86T0S88.
- [5] M. Boers et al., "A 16TX/16RX 60 GHz 802.11ad Chipset With Single Coaxial Interface and Polarization Diversity," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 49, no. 12, pp. 3031–3045, Dec 2014.
- [6] M. Johnson et al., "A 4-element 28 GHz Millimeter-wave MIMO Array with Singlewire Interface using Code-Domain Multiplexing in 65 nm CMOS," in *IEEE Radio* Frequency Integrated Circuits Symposium (RFIC), June 2019.

- S. K. Garakoui *et al.*, "Compact Cascadable g m -C All-Pass True Time Delay Cell With Reduced Delay Variation Over Frequency," *IEEE Journal of Solid-State Circuits* (JSSC), vol. 50, no. 3, pp. 693–703, March 2015.
- [8] T. Forbes *et al.*, "Design and Analysis of Harmonic Rejection Mixers With Programmable LO Frequency," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 48, no. 10, pp. 2363–2374, Oct 2013.
- [9] C. Andrews and A. C. Molnar, "A Passive Mixer-First Receiver With Digitally Controlled and Widely Tunable RF Interface," *IEEE Journal of solid-state circuits*, vol. 45, no. 12, pp. 2696–2708, 2010.
- [10] H. Yuksel et al., "A Circuit-Level Model for Accurately Modeling 3rd Order Nonlinearity In CMOS Passive Mixers," in *IEEE Radio Frequency Integrated Circuits Symposium* (*RFIC*), June 2014, pp. 127–130.
- [11] N. Reiskarimian and H. Krishnaswamy, "Magnetic-Free Non-Reciprocity Based on Staggered Commutation," in *Nature Communications*, vol. 7, no. 4, April 2016.
- [12] N. Reiskarimian *et al.*, "Analysis and Design of Commutation-Based Circulator-Receivers for Integrated Full-Duplex Wireless," *IEEE Journal of Solid-State Circuits* (JSSC), vol. 53, no. 8, pp. 2190–2201, Aug 2018.
- [13] E. Bjornson *et al.*, "Massive MIMO in Sub-6 GHz and mmWave: Physical, Practical, and Use-Case Differences," *IEEE Wireless Communications*, vol. 26, no. 2, pp. 100– 108, April 2019.
- [14] J. G. Andrews et al., "What Will 5G Be?" IEEE Journal on Selected Areas in Communications, vol. 32, no. 6, pp. 1065–1082, June 2014.
- [15] E. G. Larsson *et al.*, "Massive MIMO for Next Generation Wireless Systems," *IEEE Communications Magazine*, vol. 52, no. 2, pp. 186–195, February 2014.

- [16] A. M. Niknejad and H. Hashemi, mm-Wave Silicon Technology: 60 GHz and Beyond, 1st ed. Springer Publishing Company, Incorporated, 2008.
- [17] B. Razavi, *RF Microelectronics*. Prentice Hall, 2012.
- [18] C. L. I et al., "Toward Green and Soft: a 5G Perspective," IEEE Communications Magazine, vol. 52, no. 2, pp. 66–73, February 2014.
- [19] J. Zhou et al., "Integrated Full Duplex Radios," IEEE Communications Magazine, vol. 55, no. 4, pp. 142–151, April 2017.
- [20] D. Bharadia et al., "Full Duplex Radios," in Proceedings of the ACM SIGCOMM, 2013, pp. 375–386.
- [21] A. Sabharwal et al., "In-Band Full-Duplex Wireless: Challenges and Opportunities," *IEEE Journal on Selected Areas in Communications*, vol. 32, no. 9, pp. 1637–1652, Sept 2014.
- [22] D. Kim et al., "A Survey of In-Band Full-Duplex Transmission: From the Perspective of PHY and MAC Layers," *IEEE Communications Surveys Tutorials*, vol. 17, no. 4, pp. 2017–2046, Fourthquarter 2015.
- [23] A. Nagulu *et al.*, "Fully-Integrated Non-Magnetic 180nm SOI Circulator with > 1W P1dB, > +50dBm IIP3 and High Isolation Across 1.85 VSWR," in *IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, June 2018, pp. 104–107.
- [24] N. Reiskarimian *et al.*, "One-way ramp to a two-way highway: integrated magnetic-free nonreciprocal antenna interfaces for full-duplex wireless," *IEEE Microwave Magazine*, vol. 20, no. 2, pp. 56–75, Feb 2019.
- [25] S. Jain et al., "A 0.55-to-0.9GHz 2.7dB NF Full-Duplex Hybrid-Coupler Circulator with 56MHz 40dB TX SI Suppression," in 2018 IEEE International Solid - State Circuits Conference - (ISSCC), Feb 2018, pp. 400–402.

- [26] T. Dinc et al., "A Millimeter-Wave Non-Magnetic Passive SOI CMOS Circulator Based on Spatio-Temporal Conductivity Modulation," *IEEE Journal of Solid-State Circuits* (JSSC), vol. 52, no. 12, pp. 3276–3292, Dec 2017.
- [27] C. Yang and P. Gui, "85–110-GHz CMOS Magnetic-Free Nonreciprocal Components for Full-Duplex Transceivers," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 54, no. 2, pp. 368–379, Feb 2019.
- [28] M. Elkholy et al., "Low-Loss Integrated Passive CMOS Electrical Balance Duplexers With Single-Ended LNA," *IEEE Transactions on Microwave Theory and Techniques*, vol. 64, no. 5, pp. 1544–1559, May 2016.
- [29] B. van Liempd et al., "A +70-dBm IIP3 Electrical-Balance Duplexer for Highly Integrated Tunable Front-Ends," *IEEE Transactions on Microwave Theory and Techniques*, vol. 64, no. 12, pp. 4274–4286, Dec 2016.
- [30] S. H. Abdelhalem *et al.*, "Tunable CMOS Integrated Duplexer With Antenna Impedance Tracking and High Isolation in the Transmit and Receive Bands," *IEEE Transactions on Microwave Theory and Techniques*, vol. 62, no. 9, pp. 2092–2104, Sept 2014.
- [31] B. van Liempd et al., "A 0.7–1GHz Tunable RF Front-End Module for FDD and In-Band Full-Duplex Using SOI CMOS and SAW Resonators," in 2017 IEEE MTT-S International Microwave Symposium (IMS), June 2017, pp. 1770–1773.
- [32] T. Dinc et al., "A 60 GHz CMOS Full-Duplex Transceiver and Link with Polarization-Based Antenna and RF Cancellation," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 51, no. 5, pp. 1125–1140, May 2016.
- [33] T. Chi et al., "A 64GHz Full-Duplex Transceiver Front-End with an On-Chip Multifeed Self-Interference-Canceling Antenna and an All-Passive Canceler Supporting 4Gb/s Modulation in One Antenna Footprint," in *IEEE International Solid State Circuits* Conference (ISSCC), Feb 2018, pp. 76–78.

- [34] D. Yang et al., "A Wideband Highly Integrated and Widely Tunable Transceiver for In-Band Full-Duplex Communication," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 50, no. 5, pp. 1189–1202, May 2015.
- [35] N. Reiskarimian *et al.*, "A CMOS Passive LPTV Nonmagnetic Circulator and Its Application in a Full-Duplex Receiver," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 52, no. 5, pp. 1358–1372, May 2017.
- [36] T. Zhang et al., "Wideband Dual-Injection Path Self-Interference Cancellation Architecture for Full-Duplex Transceivers," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. PP, no. 99, pp. 1–14, 2018.
- [37] K. Chu et al., "A Broadband and Deep-TX Self-Interference Cancellation Technique for Full-Duplex and Frequency-Domain-Duplex Transceiver Applications," in *IEEE International Solid-State Circuits Conference (ISSCC)*, Feb 2018, pp. 170–172.
- [38] S. Ramakrishnan et al., "An FD/FDD Transceiver with RX Band Thermal, Quantization, and Phase Noise Rejection and >64dB TX Signal Cancellation," in 2017 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), June 2017, pp. 352–355.
- [39] E. Kargaran et al., "Low Power Wideband Receiver with RF Self-Interference Cancellation for Full-Duplex and FDD wireless Diversity," in *IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, June 2017, pp. 348–351.
- [40] J. Zhou et al., "Integrated Wideband Self-Interference Cancellation in the RF Domain for FDD and Full-Duplex Wireless," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 50, no. 12, pp. 3015–3031, Dec 2015.
- [41] D. J. van den Broek et al., "An In-Band Full-Duplex Radio Receiver With a Passive Vector Modulator Downmixer for Self-Interference Cancellation," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 50, no. 12, pp. 3003–3014, Dec 2015.

- [42] T. Chen et al., "Demo Abstract: Full-Duplex with a Compact Frequency Domain Equalization-Based RF Canceller," in *IEEE Conference on Computer Communications* Workshops (INFOCOM WKSHPS), May 2017, pp. 972–973.
- [43] K. E. Kolodziej et al., "In-Band Full-Duplex Technology: Techniques and Systems Survey," IEEE Transactions on Microwave Theory and Techniques, pp. 1–17, 2019.
- [44] D. Bharadia and S. Katti, "Full Duplex MIMO Radios," in Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI'14.
   Berkeley, CA, USA: USENIX Association, 2014, pp. 359–372. [Online]. Available: http://dl.acm.org/citation.cfm?id=2616448.2616482
- [45] E. Everett et al., "SoftNull: Many-Antenna Full-Duplex Wireless via Digital Beamforming," *IEEE Transactions on Wireless Communications*, vol. 15, pp. 8077–8092, 2016.
- [46] Z. Zhang et al., "Full Duplex 2x2 MIMO Radios," in International Conference on Wireless Communications and Signal Processing (WCSP), Oct 2014, pp. 1–6.
- [47] S. Shahramian *et al.*, "A fully integrated scalable W-band phased-array module ... self-test," in *IEEE ISSCC*, Feb. 2018.
- [48] B. Sadhu et al., "A 28GHz 32-element phased-array TRX IC with concurrent dual polarized beams ..." in *IEEE ISSCC*, Feb. 2017.
- [49] E. Larsson *et al.*, "Massive MIMO for next generation wireless systems," *IEEE Commun. Mag.*, Feb. 2014.
- [50] V. Giannini et al., "A 192-Virtual-RX 77/79GHz GMSK Code-Domain MIMO Radar SoC," in *IEEE ISSCC*, Feb. 2019.
- [51] M. B. Dastjerdi *et al.*, "Full Duplex Circulator-Receiver Phased Array Employing Self-Interference Cancellation via Beamforming," in *Proc. IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, June 2018, pp. 108–111.

- [52] T. Chen *et al.*, "Wideband full-duplex phased array with joint transmit and receive beamforming: Optimization and rate gains," in *Proc. ACM MobiHoc'19 (to appear)*, 2019.
- [53] N. Reiskarimian *et al.*, "Highly-Linear Integrated Magnetic-Free Circulator-Receiver for Full-Duplex Wireless," in *IEEE International Solid-State Circuits Conference* (*ISSCC*), Feb 2017, pp. 316–317.
- [54] R. Tseng et al., "A Four-Channel Beamforming Down-Converter in 90-nm CMOS Utilizing Phase-Oversampling," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 45, no. 11, pp. 2262–2272, Nov 2010.
- [55] H. Westerveld et al., "A Cross-Coupled Switch-RC Mixer-First technique achieving +41dBm out-of-band IIP3," in 2016 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), May 2016, pp. 246–249.
- [56] Y. Lien et al., "24.3 A High-Linearity CMOS Receiver Achieving +44dBm IIP3 and +13dBm B1dB for SAW-Less LTE Radio," in *IEEE International Solid-State Circuits* Conference (ISSCC), Feb 2017, pp. 412–413.
- [57] C. Andrews and A. Molnar, "A Passive Mixer-First Receiver With Digitally Controlled and Widely Tunable RF Interface," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 45, no. 12, pp. 2696–2708, December 2010.
- [58] J. Zhou and H. Krishnaswamy, "A System-Level Analysis of Phase Noise in Full-Duplex Wireless Transceivers," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. PP, no. 99, pp. 1–1, 2018.
- [59] D. J. van den Broek et al., "A self-interference cancelling front-end for in-band fullduplex wireless and its phase noise performance," in *IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, May 2015, pp. 75–78.

- [60] A. Sahai et al., "On the Impact of Phase Noise on Active Cancelation in Wireless Full-Duplex," *IEEE Transactions on Vehicular Technology*, vol. 62, no. 9, pp. 4494–4510, Nov 2013.
- [61] M. B. Dastjerdi et al., "28.6 Full-Duplex 2x2 MIMO Circulator-Receiver with High TX Power Handling Exploiting MIMO RF and Shared-Delay Baseband Self-Interference Cancellation," in *IEEE International Solid- State Circuits Conference (ISSCC)*, Feb 2019, pp. 448–450.
- [62] R. Chen and H. Hashemi, "Passive Coupled-Switched-Capacitor-Resonator-Based Reconfigurable RF Front-End Filters and Duplexers," in *IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, May 2016, pp. 138–141.
- [63] A. Homayoun and B. Razavi, "A Low-Power CMOS Receiver for 5 GHz WLAN," IEEE Journal of Solid-State Circuits (JSSC), vol. 50, no. 3, pp. 630–643, March 2015.
- [64] C. Thakkar et al., "9.6 A 42.2Gb/s 4.3pJ/b 60GHz Digital Transmitter with 12b/Symbol Polarization MIMO," in *IEEE International Solid-State Circuits Conference (ISSCC)*, Feb 2019, pp. 172–174.
- [65] H. Kimura et al., "A 28 Gb/s 560 mW Multi-Standard SerDes With Single-Stage Analog Front-End and 14-Tap Decision Feedback Equalizer in 28 nm CMOS," *IEEE JSSC*, vol. 49, no. 12, pp. 3091–3103, Dec 2014.
- [66] J. Lee et al., "Design of 56 Gb/s NRZ and PAM4 SerDes Transceivers in CMOS Technologies," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 50, no. 9, pp. 2061– 2073, Sep. 2015.
- [67] T. Forbes and R. Gharpurey, "A 2 GS/s Frequency-Folded ADC-Based Broadband Sampling Receiver," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 49, no. 9, pp. 1971–1983, Sep. 2014.

- [68] Z. Ru et al., "Digitally Enhanced Software-Defined Radio Receiver Robust to Out-of-Band Interference," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 44, no. 12, pp. 3359–3375, Dec 2009.
- [69] A. Chakrabarti and H. Krishnaswamy, "High-power high-efficiency class-e-like stacked mmwave pas in soi and bulk cmos: Theory and implementation," *IEEE Transactions* on Microwave Theory and Techniques (TMTT), vol. 62, no. 8, pp. 1686–1704, Aug 2014.
- [70] A. Chakrabarti and H. Krishnaswamy, "Design Considerations for Stacked Class-Elike mmWave high-speed power DACs in CMOS," in *IEEE International Microwave Symposium Digest (MTT)*, June 2013, pp. 1–4.
- [71] S. Daneshgar et al., "A 27.8Gb/s 11.5pJ/b 60GHz transceiver in 28nm CMOS with polarization MIMO," in *IEEE International Solid-State Circuits Conference (ISSCC)*, Feb 2018, pp. 166–168.
- [72] H. Fu et al., "Study on the comparison between MIMO and phased array antenna," in *IEEE Symposium on Electrical Electronics Engineering*, June 2012, pp. 478–482.
- [73] J. C. Liberti and T. S. Rappaport, Smart antennas for wireless communications: IS-95 and third generation CDMA applications. Prentice Hall communications engineering and emerging technologies series, Prentice Hall, 1999.
- [74] J. N. Murdock and T. S. Rappaport, "Consumption factor: A figure of merit for power consumption and energy efficiency in broadband wireless communications," in *IEEE GLOBECOM Workshops (GC Wkshps)*, Dec 2011, pp. 1393–1398.
- [75] T. S. Rappaport *et al.*, "38 GHz and 60 GHz angle-dependent propagation for cellular peer-to-peer wireless communications," in *IEEE International Conference on Communications (ICC)*, June 2012, pp. 4568–4573.

- [76] J. N. Murdock et al., "A 38 GHz cellular outage study for an urban outdoor campus environment," in *IEEE Wireless Communications and Networking Conference (WCNC)*, April 2012, pp. 3085–3090.
- [77] T. S. Rappaport *et al.*, "State of the art in 60-ghz integrated circuits and systems for wireless communications," *Proceedings of the IEEE*, vol. 99, no. 8, pp. 1390–1436, Aug 2011.
- [78] Y. Azar et al., "28 ghz propagation measurements for outdoor cellular communications using steerable beam antennas in new york city," in 2013 IEEE International Conference on Communications (ICC), June 2013, pp. 5143–5147.
- [79] B. Sadhu et al., "A 28-ghz 32-element trx phased-array ic with concurrent dualpolarized operation and orthogonal phase and gain control for 5g communications," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 52, no. 12, pp. 3373–3391, Dec 2017.
- [80] K. Kibaroglu et al., "An ultra low-cost 32-element 28 GHz phased-array transceiver with 41 dBm EIRP and 1.0–1.6 Gbps 16-QAM link at 300 meters," in *IEEE Radio* Frequency Integrated Circuits Symposium (RFIC), June 2017, pp. 73–76.
- [81] J. D. Dunworth *et al.*, "A 28ghz bulk-cmos dual-polarization phased-array transceiver with 24 channels for 5g user and basestation equipment," in *IEEE International Solid* State Circuits Conference (ISSCC), Feb 2018, pp. 70–72.
- [82] T. Sowlati et al., "A 60-GHz 144-Element Phased-Array Transceiver for Backhaul Application," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 53, no. 12, pp. 3640– 3659, Dec 2018.
- [83] Z. Ru et al., "A Software-defined Radio Receiver Architecture Robust to Out-of-Band Interference," in IEEE International Solid-State Circuits Conference-Digest of Technical Papers, 2009, pp. 230–231.

- [84] A. Ghaffari *et al.*, "Tunable N-path notch filters for blocker suppression: Modeling and verification," *IEEE J. Solid-State Circuits*, vol. 48, no. 6, pp. 1370–1382, 2013.
- [85] T. Dinc et al., "Synchronized Conductivity Modulation to Realize Broadband Lossless Magnetic-Free Non-Reciprocity," in *Nature Communications*, vol. 8, no. 10, October 2017.
- [86] K. Joardar et al., "An improved MOSFET model for circuit simulation," IEEE Transactions on Electron Devices, vol. 45, no. 1, pp. 134–148, 1998.
- [87] J. Sombrin et al., "Discontinuity at origin in Volterra and band-pass limited models," in 2013 IEEE MTT-S International Microwave Symposium Digest (MTT), June 2013, pp. 1–4.
- [88] H. Khatri et al., "Distortion in current commutating passive CMOS downconversion mixers," *IEEE Transactions on Microwave Theory and Techniques*, vol. 57, no. 11, pp. 2671–2681, 2009.
- [89] P. Bendix et al., "RF distortion analysis with compact MOSFET models," in Proceedings of the IEEE 2004 Custom Integrated Circuits Conference (IEEE Cat. No.04CH37571), Oct 2004, pp. 9–12.
- [90] Y. S. Chauhan et al., "BSIM6: Analog and RF Compact Model for Bulk MOSFET," IEEE Transactions on Electron Devices, vol. 61, no. 2, pp. 234–244, Feb 2014.
- [91] N. Scheinberg and A. Pinkhasov, "A Computer Simulation Model for Simulating Distortion in FET Resistors," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 19, no. 9, pp. 981–989, 2000.
- [92] N. D. Arora *et al.*, "PCIM: A Physically Based Continuous Short-Channel IGFET Model for Circuit Simulation," *IEEE Transactions on Electron Devices*, vol. 41, no. 6, pp. 988–997, Jun 1994.
- [93] X. Li *et al.*, "PSP 102.3," NXP Semiconductors, Tech. Rep., 2008.

- [94] G. Gildenblat et al., "PSP: An Advanced Surface-Potential-Based MOSFET Model for Circuit Simulation," *IEEE Transactions on Electron Devices*, vol. 53, no. 9, pp. 1979–1993, Sept 2006.
- [95] M. B. Dastjerdi and H. Krishnaswamy, "A Simplified CMOS FET Model Using Surface Potential Equations For Inter-Modulation Simulations of Passive-Mixer-Like Circuits," in *IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, June 2017, pp. 132– 135.
- [96] Columbia High-Speed and Mm-wave IC (CoSMIC) Lab. [Online]. Available: http://cosmic.ee.columbia.edu/downloads.html
- [97] H.-K. Lim and J. Fossum, "An Analytic Characterization of Weak-Inversion Drift Current In a Long-Channel MOSFET," *IEEE Transactions on Electron Devices*, vol. 30, no. 6, pp. 713–715, 1983.
- [98] C. Turchetti, "Relationships for The Drift and Diffusion Components of The Drain Current In an MOS Transistor," *Electronics Letters*, vol. 19, no. 23, pp. 960–962, 1983.
- [99] Y. Tsividis and C. McAndrew, Operation and Modeling of the MOS Transistor. Oxford Univ. Press, 2011.
- [100] A. G. Sabnis and J. T. Clemens, "Characterization of the electron mobility in the inverted 100 si surface," in 1979 International Electron Devices Meeting, vol. 25, 1979, pp. 18–21.
- [101] L. Yau, "A Simple Theory to Predict The Threshold Voltage of Short-Channel IGFET's," Solid-State Electronics, vol. 17, no. 10, pp. 1059–1063, 1974.
- [102] X. Li et al., "Benchmark Tests For MOSFET Compact Models With Application to The PSP Model," *IEEE Transactions on Electron Devices*, vol. 56, no. 2, pp. 243–251, 2009.

- [103] G. Gildenblat et al., "Theory and Modeling Techniques Used in The PSP Model," Proc. NSTI-Nanotech WCM, pp. 409–604, 2006.
- [104] H. C. De Graaff and F. M. Klaassen, Compact Transistor Modelling For Circuit Design. Springer Science & Business Media, 2012.
- [105] F. Klaassen and R. Velghe, "Compact Modelling of The MOSFET Drain Conductance," in *IEEE European Solid State Device Research Conference (ESSDERC).*, 1989, pp. 418–422.
- [106] D. Yang et al., "Optimized Design of N-Phase Passive Mixer-First Receivers in Wideband Operation," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 62, no. 11, pp. 2759–2770, Nov 2015.
- [107] H. Westerveld et al., "A Cross-Coupled Switch-RC Mixer-First Technique Achieving+ 41dBm Out-of-Band IIP3," in IEEE Radio Frequency Integrated Circuits Symposium (RFIC), 2016, pp. 246–249.

# Part II

# Appendix

## Appendix A

A Circuit Simulation Technique for Inter-modulation Simulations and Linearity Analysis of N-path Filters and Passive-Mixer-Like Circuits

Recently passive-mixer-based circuits, such as current-mode receivers [83], mixer-first receivers (MFRxs) [9] and N-path filters [84], have gained attention thanks to their low noise, high linearity and impedance translation properties. Recently, non-reciprocal, non-magnetic components such as circulators have been demonstrated in CMOS for the first time using passive mixers [11, 85] at both RF and millimeter-wave frequencies. Such circuits use the CMOS transistor as a switch, and as CMOS technology scales, their performance improves in terms of linearity, noise, power consumption and/or operating frequency. However, in these circuits, there typically exists a trade-off between these metrics, and consequently, linearity simulations are critical during the design phase.

Scaled CMOS technologies are typically first introduced for and exploited by digital circuits and systems. In such circuits, the FET does not experience a source-drain reversal, and accurate modeling of the drain current across the reversal point is not critical for circuit or



Figure A.1: Proposed circuit simulation technique based on surface potential equations. system performance. Hence, most of the foundry-provided models, including BSIM, MM9, PCIM, and EKV, are not symmetric around zero drain-source bias, and suffer from discontinuities in the derivatives of the drain-source current [86, 87]. Indeed, in one of the most common models, BSIM4, the third-order inter-modulation (IM3) tone exhibits an erroneous 2dB/dB slope versus input power rather than 3dB/dB (Shown later in Fig. A.11) [88,89]. Although this problem is solved in the BSIM6 model, this model is not provided by many factories [90]. This problem can arise from two possible sources, as will be discussed in detail later in this paper - (1) interchange of  $V_{GS}$  and  $V_{GD}$  when  $V_{DS}$  changes sign [91], and (2) failure to accurately model the velocity saturation phenomenon [86, 92]. The Philips Surface Potential (PSP) model, is quasi-static in nature, which makes it suitable for accurately simulating digital, analog, and RF circuits [93, 94], and does not exhibit discontinuities in current derivatives around the source-drain reversal point and can accurately capture IM3 distortion, it is typically not available to circuit designers in either new or legacy processes. It is also extremely challenging for a circuit designer to extract the hundreds of unknown parameters to create their own PSP model.

In [10], a circuit level solution is proposed to enable linearity simulations. This solution breaks each transistor into  $2 \times n + 1$  transistors in parallel and offsets the  $V_{DS}$  across them by

a small amount  $\Delta V$ . This approach breaks the discontinuity of the second derivative of the current around  $V_{DS}=0$  to smaller steps which results in a 3dB/dB slope for IM<sub>3</sub> tones over a limited range of input signal powers. However, this approach imposes a high processing load, as to increase the input signal range over which a 3dB/dB slope is seen, larger values of n must be used. Finally, measurements of the transistor are needed to correctly choose the  $\Delta V$  parameter.

We describe a circuit simulation technique [95] that identifies a substantially reduced set of PSP parameters and equations that are necessary for accurate linearity simulations of passive-mixer-like circuits, and describe how the parameters can be extracted from simulations of foundry-provided models or measurements. Furthermore, we show how the equations allow analytical prediction of linearity metrics for passive-mixer-like circuits, illuminating design trade-offs and guidelines. These analytical equations can be employed as a starting point for out-of-band (OOB)-IIP3 prediction, but for in-band linearity prediction or more linearity stringent applications, accurate simulations are required. Finally, we validate the model with extensive simulations and measurements from a 65nm CMOS test transistor switch, a 65nm CMOS 0.15-2GHz MFRx, and two 65nm CMOS 750MHz N-path-filter-based non-reciprocal circulators reported in [11] and [12]. The model, as well as curve-fitting code to extract the model parameters, have been placed online for download [96].

## A.1 Circuit simulation technique using SP equations and short channel effects

The proposed circuit technique is depicted in Fig. A.1. It consists of Verilog-A code to predict the transistor current using surface potential equations including short-channel effects described in [95]. To account for second-order parasitics, namely terminal parasitic capacitances and gate and body leakage current, the foundry-provided model is split in two, and connected from gate to source and drain as dummy transistors.

It is interesting to note that the BSIM4 model not only fails at predicting derivatives



Figure A.2: Simulation of total capacitance at the transistor terminals of  $24 \times \frac{4\mu m}{60nm}$  device in 65nm CMOS using foundry BSIM4 models and the proposed circuit simulation technique.

of the current continuously but also predicts discontinuous total capacitance at the source  $(C_s)$  and drain  $(C_d)$  terminals (Fig. A.2). This is because the derivatives of the terminal charges in BSIM4 exhibit discontinuity, leading to discontinuous  $C_s$  and  $C_d$  at zero drainsource bias. The dummy transistors in our proposed circuit technique fix this issue by predicting a capacitance equal to the average of the forward and reverse points. It should be mentioned, however, that this approach imposes minor restrictions on the usage of our model – (i) parasitics that are different for the source and drain due to layout should be handled separately, and (ii) this limits the equivalent circuit to transistors operating in the deep triode region, such as switches in passive-mixer-like circuits.

#### A.1.1 Long-Channel Transistor Surface Potential Equations

The current along the channel at each position is a combination of both drift and diffusion currents. These currents can be derived by integrating charge along the channel from the source to the drain. This results in the drain-source current as a function of the surface

potential (SP) across the channel [97–99]. The SP is defined as the potential of the inversion layer referenced to the deep body, shown in Fig. A.3.

$$I_{DS} = I_{DS,drift} + I_{DS,diff.} \tag{A.1}$$

$$I_{DS, drift} = \frac{W}{L} \int_{\psi_s(0)}^{\psi_s(L)} -\mu Q'_I(x) d\psi_s(x)$$
(A.2)

$$I_{DS, diff.} = \frac{W}{L} \mu \phi_t (Q'_I(L) - Q'_I(0))$$
(A.3)

W, L and  $\mu$  are the transistor width, length and effective surface mobility respectively.  $Q'_I(x)$  is the inversion-layer charge per unit area at position x along the channel.  $\psi_s(x)$  is the corresponding SP (Fig. A.3) and  $\phi_t$  is thermal voltage. The following equations translate the transistor terminal voltages to the SP at the drain and source ends of the transistor.

$$\psi_s(0) = V_{GB} - V_{FB} - \gamma \sqrt{\psi_s(0) + \phi_t e^{(\psi_s(0) - 2\phi_F - V_{SB})/\phi_t}}$$
(A.4)

$$\psi_s(L) = V_{GB} - V_{FB} - \gamma \sqrt{\psi_s(L) + \phi_t e^{(\psi_s(L) - 2\phi_F - V_{DB})/\phi_t}}$$
(A.5)

 $\gamma$ ,  $\phi_F$  and  $V_{FB}$  are the body-effect coefficient, Fermi potential and flat-band voltage respectively. Note that these equations do not have general closed-form solutions and must be solved numerically. The SP can be approximated by a straight line across the channel with a mean value of  $\psi_{sm} = (\psi_s(0) + \psi_s(L))/2$ . Hence, the drift and diffusion components of the drain-source current can be approximated using following equations:

$$I_{DS, drift} = \mu C_{ox} \frac{W}{L} (V_{GB} - V_{FB} - \psi_{sm} - \gamma \sqrt{\psi_{sm}}) (\psi_s(L) - \psi_s(0))$$
(A.6)

$$I_{DS, diff.} = \mu C_{ox} \frac{W}{L} \phi_t (1 + \frac{\gamma}{2\sqrt{\psi_{sm}}}) (\psi_s(L) - \psi_s(0))$$
(A.7)

Here,  $C_{ox}$  is oxide capacitance per unit area. The interesting characteristic of these equations is that they are symmetric with respect to the drain and source. If drain and source are



Figure A.3: Cross section of the MOS transistor.

flipped, the current becomes the negative of the original value as expected. This is not true for source-referenced equations like the conventional square-law triode-region current  $I_{DS} = \mu C_{ox} \frac{W}{L} ((V_{GS} - V_{th}) V_{DS} - \frac{1}{2} V_{DS}^2).$ 

#### A.1.2 Effective Mobility

The mobility of carriers is determined by scattering mechanisms. Electrons in the inversion layer flow near the bulk-oxide interface. The gate-induced transverse field is perpendicular to the direction of the current flow, and intensifies the scattering effect and results in reduced effective mobility. The effective mobility ( $\mu_{eff}$ ) of the whole channel can be written as the following function of mobility  $\mu$  at each position in the channel:

$$\mu_{eff} = \frac{1}{\frac{1}{L} \int_{0}^{L} \frac{dx}{\mu(x)}}$$
(A.8)

The mobility at each point inside the channel depends on the effective electric field at that point, and it can be calculated as a function of the inversion layer and depletion layer charges. It has been shown that after integration over the channel, the effective mobility can be written as the following function of total channel inversion layer charge  $\overline{Q'_{I}}$  and depletion layer charge  $\overline{Q'_{B}}$  [100]:

$$\mu_{eff} = \frac{\mu_0}{1 - \frac{a_0}{\epsilon_s} (\overline{Q'_B} + \eta_E \overline{Q'_I})},\tag{A.9}$$

where  $\epsilon_s$  is silicon permittivity and  $\mu_0$ ,  $a_0$  and  $\eta_E$  can be treated as curve fitting parameters. It has been further shown that these charges can be approximated using the mean SP along the channel as follows [99]:

$$\overline{Q'_B} = -C_{ox}\gamma\sqrt{\psi_{sm}} \tag{A.10}$$

$$\overline{Q'_I} = -C_{ox}(V_{GB} - V_{FB} - \psi_{sm}) - \overline{Q'_B}$$
(A.11)

#### A.1.3 Charge Sharing

In deep-sub-micron devices, the extension of the drain and source depletion charge into the channel cannot be neglected. These charges are a function of the drain-body and source-body voltages, and effectively reduce the gate control over the channel total depletion layer charge. This can be quantified by defining the effective body coefficient as  $\gamma_{eff} = \gamma \frac{\hat{Q}_B}{Q_B}$ , where  $\frac{\hat{Q}_B}{Q_B}$  takes into account the ratio between the portion of the depletion layer that is controlled by the gate and the overall channel depletion layer charge. Multiple methods have been proposed for calculating this effect. Based on the assumption that the drain and source junction edges have a cylindrical radius equal to their depth  $d_j$  and considering equal depletion depth for drain and source dB, the following equation has been proposed in [101]:

$$\frac{\hat{Q_B}}{Q_B} = 1 - \frac{d_j}{L} \left( \sqrt{1 + \frac{2dB}{d_j} - 1} \right) \tag{A.12}$$

However, assuming the same depletion depth for source and drain is not necessarily valid, as they can have different depletion layer charge based on their voltages. By expanding the above equation assuming different depths for source and drain, namely  $dB_S$  and  $dB_D$ respectively, we can derive the following equation to take into account the charge sharing effect which is symmetric irrespective of the drain-source current direction.

$$\frac{\hat{Q_B}}{Q_B} = 1 - \frac{\Delta_S}{L} (1 - \frac{dB_S}{2dB}) - \frac{\Delta_D}{L} (1 - \frac{dB_D}{2dB})$$
(A.13)

$$\Delta_S = d_j \left(-1 + \sqrt{1 + 2\frac{dB_S}{d_j}}\right) \tag{A.14}$$

$$\Delta_D = d_j \left(-1 + \sqrt{1 + 2\frac{dB_D}{d_j}}\right) \tag{A.15}$$

$$dB = \frac{dB_S + dB_D}{2} \tag{A.16}$$

Note that in these equations, the depletion depth of the junction can be calculated by the following equation using the junction build-in-potential  $(\phi_{bi})$ .

$$dB_{S/D} = \sqrt{\frac{2\epsilon_s}{qN_A}} \sqrt{\phi_{bi} + V_{SB/DB}},\tag{A.17}$$

where q is the electron charge,  $N_A$  is the acceptor concentration, and  $d_j$  is the junction depth which is treated as a curve-fitting parameter. For transistors operating in strong inversion,  $\phi_{bi} = \phi_0 = 2\phi_F + C_0 \times \phi_t$ , where  $C_0$  is about 5, but here it is treated as curve fitting parameter

#### A.1.4 Velocity Saturation

In long-channel devices, the longitudinal electric field inside the channel is so small that there is a linear relation between this field and the velocity of carriers, but this assumption

is not valid for short-channel devices, and the velocity of carriers can saturate even when the device is operating in triode. This effect is approximated by the following equation or its equivalents in many transistor models [89]:

$$v_d = \frac{\mu_0 |E|}{(1 + (\frac{|E|}{\varepsilon_c})^m)^{\frac{1}{m}}}$$
(A.18)

where the  $v_d$  is velocity of the carrier, and  $\mu_0$  is a curve-fitting parameter related to the mobility and E is longitudinal electric field.  $\varepsilon_c$  is the critical electric field at which the velocity saturates. For the case of m = 1, we eventually arrive at the following equation for drain current:

$$I_{DS, \text{ vel. sat.}} = \frac{I_{DS, \text{ w/o vel. sat.}}}{1 + \frac{|V_{DS}|}{V_{DS,sat.}}},$$
(A.19)

where  $V_{DS,sat.}$  is approximately equal to  $L\varepsilon_c$ . The second and higher derivatives of drain current at  $V_{DS} = 0$  do not exist, and this results in Gummel symmetry test failure. It can be proven that for the n-th derivative of the current to exist at  $V_{DS} = 0$ , it is required that m > n - 1 [102]. For the case of m = 2, the following equation can be derived that exhibits continuous derivatives of the current up to second-order [103].

$$I_{DS, vel. sat.} = \frac{I_{DS, w/o vel. sat.}}{0.5(1 + \sqrt{1 + 2(\frac{V_{DS}}{V_{DS, sat.}})^2})}$$
(A.20)

#### A.1.5 Channel Length Modulation

For a device operating in strong-inversion and saturation, as the drain-source voltage increases, a region forms near the drain that is no longer in inversion. Carriers in this region travel at the saturation velocity as the electric field is more than the critical field  $\varepsilon_c$ . This phenomenon can be modeled as a decrease in the transistor effective length  $\Delta L$ , and can be approximated using the following equation [104, 105]:

$$\Delta L = l_a \times ln(1 + \frac{V_{DSX} - V_{DSX,eff}}{V_e})$$
(A.21)

$$V_{DSX} = \sqrt{V_{DS}^2 + 0.01} - 0.1 \tag{A.22}$$

$$V_{DSX,eff} = \sqrt{V_{DS,eff}^2 + 0.01} - 0.1 \tag{A.23}$$

$$V_{DS,eff} = \frac{V_{DS}}{\{1 + (\frac{V_{DS}}{V'_{DS}})^{2A}\}^{\frac{1}{2A}}}$$
(A.24)

where  $V_e$ ,  $l_a$  and  $V'_{DS}$  are curve-fitting parameters. An initial value for  $l_a$  is  $\sqrt{3t_{ox}d_j}$ , where  $t_{ox}$  is the oxide thickness.  $V_{DS,eff}$  is the effective drain-source voltage defined through a smoothing function for transition between the saturation and non-saturation regions. To make sure that there is no discontinuity in the derivatives of the drain-source current around  $V_{DS} = 0$ , A is chosen to be a large even value (A = 10).

#### A.1.6 Parameter Extraction

To extract the equations parameters, first, foundry-provided models are used to extract a series of curves that describe  $I_{DS}$  versus  $V_{DS}$  for different values of  $V_{GS}$  and  $I_{DS}$  versus  $V_{GS}$  for different values of the  $V_{SB}$ . Then a MATLAB curve-fitting program is used to curve fit the SP equation parameters to these curves. The equations used are (A.6), (A.7), (A.4), (A.5), (A.9), (A.13), (A.20) and (A.21). Overall, 15 parameters are found through least-squares curve-fitting. Initially, the  $I_{DS}$  vs.  $V_{GS}$  curves are employed to obtain an initial guess for  $V_{FB}$ ,  $\mu$ ,  $\gamma$ ,  $C_{ox}$ ,  $a_0$ ,  $\eta_E$ ,  $d_j$ ,  $N_A$ ,  $C_0$ ,  $C_F$  and  $C_{FB}$  parameters. Then,  $I_{DS}$  vs.  $V_{DS}$  curves are used to obtain an initial guess for  $V_{DS}$ , sAT,  $l_a$ ,  $V_e$ , and  $V'_{DS}$ . Finally, curve-fitting is performed on both sets of curves to optimize the model. This parameter extraction approach is based on the assumption that the foundry-provided model accurately models the drain-source current for  $V_{DS} > 0$ . If measurement results from a test device are available, one can use them to further optimize the model. The parameter extraction code has also been placed online for public use [96].



Figure A.4: (a) Single-ended 4-path mixer-first receiver (MFRx), and (b) MFRx/N-path filter equivalent circuit for the analysis of OOB linearity.

## A.2 Mixer-First Receiver/N-Path-Filter Linearity Analysis and Design Trade-Offs

In this section, we use the surface potential equations to derive a closed-form expression for the linearity of MFRxs and N-path filters, and use the expression to arrive at design tradeoffs. While an extensive optimization study of MFRxs been presented in [106], the authors use source-referenced equations. Here, we show that body-referenced equations that do not

require ignoring source-drain reversal (i.e., assuming terminals to be permanently source or drain) enable arriving at a closed-form expression that is consistent with [106]. The noise analysis presented in this section is also consistent with [106], but also incorporates  $R_m$  into the design procedure.

Fig. A.4(a) depicts a 4-path single-ended MFRx. For in-band (IB) signals, the BB capacitors,  $C_{BB}$ , at the input of the transimpedance amplifiers (TIAs) present an open-circuit impedance. On the other hand, OOB interference signals at far-out offset frequencies flow through  $C_{BB}$  to ground, and do not get amplified. However, intermodulation distortion (IMD) signals generated by the switch devices lie at IB frequencies and get amplified.

A simple model that eases analysis of OOB nonlinearity performance of an MFRx is shown in Fig. A.4(b), and consists of a single transistor switch that is permanently ON and connected to ground [106], valid since only one switch is ON at a time as the CLK pulses are non-overlapping. The rest of the circuit can be modeled as a ideal mixer and gain. From this model, it is clear that the analysis of this section would be valid for the OOB linearity performance of N-path filters as well.

In deep sub-micron FETs, the main source of non-linearity is velocity saturation. Consider a strongly-inverted transistor operating in the triode region. It has been shown that the equation below, predicts the transistor I-V characteristic and its derivatives accurately [99].

$$I_{DS} = \frac{k(V_{GB} - V_{FB} - \phi_0 - \frac{V_{SB} + V_{DB}}{2} - \gamma \sqrt{\phi_0 + \frac{V_{SB} + V_{DB}}{2}})V_{DS}}{0.5[1 + \sqrt{1 + 2\frac{V_{DS}^2}{V_{SAT}^2}}]}$$
(A.25)

where  $V_{FB}$  is Fermi potential level and  $\phi_0$  is the surface potential of two terminal MOS structure in strong inversion and  $k = \mu \frac{W}{L} C_{ox}$  ( $\mu$ : carries mobility,  $C_{ox}$ : oxide capacitance density, and W/L: width/length of FET). Let  $V_{XY} = V_{XY0} + v_{xy}$  represent the DC and AC part of the voltage across the X and Y nodes, respectively. For OOB interference,  $V_{SB} \approx 0$ . Let  $\phi_{0B} = \phi_0 + \frac{V_{SB0} + V_{DB0}}{2}$ ,  $V_{T0} = V_{FB} + \phi_{0B} + \gamma \sqrt{\phi_{0B}}$  and  $V_{OD} = (V_{GS0} - V_{T0})$ . Using a Taylor expansion, one can write

$$i_{ds} = c_1 v_{in} + c_2 v_{in}^2 + c_3 v_{in}^3 + \dots$$

$$c_1 = k V_{OD}, \ c_2 = -\frac{k}{2} \left(1 + \frac{\gamma}{2\sqrt{\phi_{0B}}}\right), \ c_3 = -\frac{k V_{OD}}{2V_{SAT}^2}$$
(A.26)



Figure A.5: Simplifed model to analyze the MFRx noise performance [9].

Using these equations and the simple MFRx OOB model (Fig. A.4b), we can calculate the linearity performance. By replacing  $v_{in}$  by  $v_{src} - (R_S + R_m)i_{ds}$ , and using a Taylor expansion, we can reach following equation:

$$i_{ds} = a_1 v_{src} + a_2 v_{src}^2 + a_3 v_{src}^3 + \dots$$

$$a_1 = \frac{c_1}{1 + c_1 R_t}, \ a_2 = \frac{c_2}{(1 + c_1 R_t)^3}, \ a_3 = \frac{c_3 (1 + c_1 R_t) - 2c_2^2 R_t}{(1 + c_1 R_t)^5}$$
(A.27)

where  $R_t = R_s + R_m$ .

Using these equations,  $V_{IIP3}$  and  $V_{-1dB,Comp}$  can be calculated.

$$V_{1dB-Comp.} = \frac{1}{2} \sqrt{.145 \frac{c_1 (1+c_1 R_t)^4}{c_3 (1+c_1 R_t) - 2c_2^2 R_t}}$$
(A.28)

$$V_{IIP_3} = \frac{1}{2} \sqrt{\frac{4}{3} \frac{c_1 (1 + c_1 R_t)^4}{c_3 (1 + c_1 R_t) - 2c_2^2 R_t}}$$
(A.29)

The additional  $\frac{1}{2}$  is to capture the fact that we are interested in the linearity referenced to the input port, rather than the source voltage. Note that the first term in the denumerator shows the contribution of the switch third-order non-linearity, and the second term shows the conversion of the switch second-order nonlinearity into mixer third-order nonlinearity. For deep-submicron devices, the former is dominant. As switch resistance is  $R_{SW} = (kV_{od})^{-1}$ , the term  $c_1R_t$  is equivalent to the ratio of the sum of port resistance and series resistance to the switch resistance.  $R_{SW}$  is typically of the order of a few ohms, and so  $c_1R_t = \frac{R_t}{R_{SW}} \gg 1$ . Assuming the input port resistance is fixed, and the fact that  $V_{SAT} = L\varepsilon_c$ , we can write

$$V_{IIP_3} \propto L \varepsilon_c R_{SW}^{-\frac{3}{2}} = (\mu C_{ox} V_{OD})^{\frac{3}{2}} L^{-\frac{1}{2}} W^{\frac{3}{2}}$$
(A.30)

This equation shows that the passive-mixer-like circuits linearity suffers from technology scaling if the switch resistance is kept constant. On the other hand, let us consider the case where the switch capacitance is maintained constant. Let us assume that carrier mobility, critical field, and transistor overdrive voltage do not scale. Further, let us assume that gate oxide thickness scales with technology along with the device length, by a factor of  $\beta$ . For the same gate capacitance ( $C_g = W(\frac{L}{\beta})(\beta C_{ox})$ ), the width of the transistor remains constant. Then, the scaling of technology will benefit the  $V_{IIP_3}$  by a factor of  $\beta^2$ .



Figure A.6: Trade-off of between OOB-IIP3 and power consumption at 1GHz ( $R_s = 50\Omega$ ).



Figure A.7: Trade-off of between OOB-IIP3 and NF vs.  $R_t$  for different values of the amplifier gain  $(R_{SW} = 3\Omega, R_s = 50\Omega)$ .

The model presented in Fig. A.4 illuminates MFRx design trade-offs: increasing the series resistance  $R_m$  while reducing  $R_{SW}$  improves OOB linearity at the expense of NF and clock path power consumption. Although a choice of  $R_m \approx R_S$  limits the receiver minimum NF to 3dB theoretically, in practice, the reported NF is typically > 6dB [107]. The larger the  $R_m$ , the smaller the  $R_F$  due to the input matching criterion, resulting in increased NF.

Using the model illustrated in Fig. A.5, the input resistance looking into the mixer can be found as follows [9]:

$$R_{in} = R_{SW} + R_m + \gamma_m R_B || R_{Sh}$$

$$R_{Sh} = (R_s + R_m + R_{SW}) \frac{4\gamma_m}{1 - 4\gamma_m}$$

$$R_F \approx A_v R_B$$
(A.31)

where  $R_B$  is the input resistance looking into the input of the IF amplifier,  $A_v$  is the amplifier gain, and  $\gamma_m$  is a constant related to the number of paths and is equal to  $\frac{2}{\pi^2}$  for a 4-path

mixer.  $R_{Sh}$  takes into account the effect of harmonic conversion. For the input matching,  $R_B$  can be found as

$$R_B = \frac{1}{\gamma_m} \frac{R_{Sh}(R_s - R_m - R_{SW})}{R_{Sh} - (R_s - R_m - R_{SW})}$$
(A.32)

Assuming the resistive feedback and input series resistance noise contributions are dominant, and neglecting IF amplifier noise, the NF of the matched MFRx can be found to be:

$$F = 1 + \frac{R_{SW} + R_m}{R_s} + \frac{(R_{Sh} - (R_s - R_m - R_{SW}))(R_s + R_m + R_{SW})^2}{A_v R_s R_{Sh} (R_s - R_m - R_{SW})}$$
(A.33)

This equation clearly shows that the input matching requirement determines the minimum achievable NF. The first term arises from the noise of  $R_m$ , and the smaller the  $R_m$ , the better the noise performance. The second term arises from the noise of  $R_F$ . The larger the  $R_F$ , the lower the noise performance, but  $R_F$  is ultimately limited by the achievable OTA gain.



Figure A.8: Simulated  $I_{DS}$  vs.  $V_{DS}$  of a  $24 \times \frac{4\mu m}{60nm}$  device for different values of the  $V_{GS}$  using factory-provided BSIM4 models and our circuit simulation technique.

Using the equations derived so far, we can quantify these trade-offs <sup>1</sup> so that circuit designer having a linearity and NF specification in hand can choose a initial value for  $R_m$  and  $R_{SW}$  to minimize MFRx power consumption. Depicted in Fig. A.6, the IIP3 is plotted for different values of the switch resistance, and the associated power consumption is simulated. For a given required OOB-IIP<sub>3</sub> and power consumption budget, the value of  $R_{SW}$  and  $R_m$  can be chosen. Then, the amplifier gain needs to be chosen based on the total series resistance,  $R_t$ , and the minimum NF requirement, shown in Fig. A.7. Finally, a design iteration is required to take into account the rest of the clock circuitry, amplifier power consumption and amplifier noise to obtain an optimized MFRx circuit.



Figure A.9: Gummel test on a  $24 \times \frac{4\mu m}{60nm}$  device using BSIM4 models, the proposed circuit simulation technique, 3rd order polynominal curve-fit and measured data.

<sup>&</sup>lt;sup>1</sup>For this quantification, we ignored the body effect and approximated  $c_2 = -k/2$ . Using simulation data, we extracted  $V_{OD} = 0.7V$ ,  $V_{SAT} = 0.2$  and  $kV_{OD} = \frac{1}{3\Omega}$ . Note that these parameters can be different for our circuit simulation technique for accuracy.

#### A.3 Simulation and Measurement Results

In this section, the proposed circuit simulation technique is validated with simulations and measurements. One might wonder if a simple 3rd order polynomial describing the relation between drain-source current and voltage can be used instead of surface potential equations (For only one FET size and  $V_{GS} = 1.2 V$ ). Hence, we present the results of using 3rd order polynomial curve-fitted to drain-source current as well. The simulated drain-source current versus the drain-source voltage for different values of gate voltage using the proposed circuit technique matches well with the available BSIM4 model in Fig. A.8 for a  $24 \times \frac{4\mu m}{60nm}$  device. In the Gummel symmetry test described in Fig. A.9, our proposed circuit technique exhibits an excellent match to measurements from a fabricated 65nm CMOS  $24 \times \frac{4\mu m}{60nm}$  test transistor in terms of first and second derivatives of  $I_{DS}$  and preserves continuity, while the BSIM4 model fails as described earlier, and polynomial predicts a line with slope close to measurements.

To evaluate the ability of the proposed technique to predict harmonics accurately, a single-tone sinusoidal (14MHz) signal is applied to the drain side of the test FET, and the fundamental tone and its harmonics are observed on a spectrum analyzer as illustrated in Fig. A.10(a). The results in Fig. A.10(b) prove the proposed circuit's ability to predict the current and its harmonics accurately at least up to the 7th harmonic for a wide range of power levels thanks to its inherent symmetry, while the factory-provided model and the prior work presented in [10] can only predict the harmonics precisely for a narrow power range where the drain-source voltage is large and the time portion where the signal is close to  $V_{DS} = 0$  is very small, and the polynomial model fails to predict the harmonics. The higher-order harmonics in this measurement are results of both FET current higher-order nonlinearities and conversion of the lower order nonlinearity effects to higher orders due to series resistance of the port. Hence, a polynomial that fails to predicts 2nd harmonics correctly, fails to do so for 3rd harmonic and so on.

Fig. A.11 presents a two-tone test on a grounded single-FET switch, where one side is grounded and the other side is excited with a port. The switch current is monitored and converted to power using a  $50\Omega$  impedance level. Clearly, our proposed circuit technique can



Figure A.10: Simulation and measurement results from a harmonic test on a single  $24 \times \frac{4\mu m}{60nm}$  transistor switch: (a) test setup, and (b) results.

predict the IM<sub>3</sub> precisely for a wide power range. The BSIM4 model does not predict the correct 3dB/dB slope, while the approach presented in [10] once again works over a limited power range only, and the polynomial model predicts the IM<sub>3</sub> around 9dB less than SSP model. In addition, the approach presented in [10] is far more computationally intensive - the PSS portion of the simulation in Fig. A.11 takes 1.2s versus 298ms for our proposed circuit technique (4 times longer). Note that for n=32, the approach in [10] breaks each device into 65 parallel transistors while our approach uses two devices plus a computationally-light Verilog-A code.

To further evaluate the proposed circuit technique performance, we designed and fabricated a doubly-balanced 0.15-2GHz MFRx in 65nm CMOS technology. The circuit diagram



Figure A.11: IIP3 simulations of a single  $24 \times \frac{4\mu m}{60nm}$  switch to ground - BSIM4, the modeling approach in [10], and our proposed circuit technique.

is depicted in Fig. A.12. Simple inverter-based TIAs are used, and the mixer switch devices are  $24 \times \frac{4\mu m}{60\pi m}$ . The impedance translation through the mixer-first structure contributes 38  $\Omega$  and an additional series  $R_M = 12\Omega$  is included to realize the input matching. The 38dB conversion gain is measured. The out-of-band (OOB) IIP3 is measured with large offset frequencies to make sure the OTA is not the linearity bottleneck - tones at 500MHz and 699 MHz for a LO frequency of 300MHz. The measured and simulated results are provided in Fig. A.13. The proposed circuit technique is only used for the passive mixer switch transistors, and predicts an OOB IIP3 of +37.5dBm, close to the measured result of +34.8dBm, showing precision of 2.7dB, superior to the 4dB precision reported in [10]. It should also be mentioned that while entire receiver simulations can be run with our proposed technique, they are just not feasible using the model in [10] on our computational resources due to its modeling complexity. The theoretical equations described earlier predict an OOB



Figure A.12: Block and circuit diagram and chip micrograph of the 65nm CMOS 0.15-2GHz mixer-first receiver.

IIP3 of +34dBm, once again very close to the measured result, validating the ability of the theoretical equations to be used for preliminary design optimization.

To verify our proposed technique's capability to predict the in-band IIP<sub>3</sub>, measurement results from our 65nm CMOS 750MHz non-magnetic non-reciprocal integrated N-path-filterbased circulator [11] and from our 65nm CMOS 750MHz circulator-receiver [12] are compared with new simulations performed with our circuit technique (Fig. A.14). Our circuit predicts the absolute value of the transmitter-to-antenna IIP3 with 4dB precision. It is also noteworthy that doubling the switch transistor size in [12]  $(24 \times \frac{4\mu m}{60nm})$  compared to [11]  $(24 \times \frac{2\mu m}{60nm})$ 



Figure A.13: Comparison of measured MFRx out-of-band (OOB) IIP3 with the simulated OOB IIP3 using the proposed circuit simulation technique.

improves the IIP<sub>3</sub> by around 3dB as predicted by the simulations. These simulations show slightly less precision compared to the MFRx because the TX-ANT linearity in these circuits is dependent on the TX-RX isolation, as the N-path filter is placed at the RX port to suppress the signal across it for TX-port excitations. The TX-RX isolation, in turn, depends on the matching at the antenna port, which is perfect in the simulation but not in measurements. For simulating using the polynomial, we used if the condition that if  $V_{GB}$  is more than 1 V, then the FET is ON and drain-source current is predicted by the 3rd order polynomial and if  $V_{GB}$  is less than 1 V, then the FET is OFF. The IM<sub>3</sub> predicted by this model shows random behavior based on the input power level range and fails to predict the IIP<sub>3</sub>.

#### A.4 Summary

Digitally-driven foundries typically provide models that yield unphysical results when simulating linearity metrics of passive-mixer-like circuits. In this appendix, a circuit simulation



Figure A.14: Comparison of transmitter-to-antenna  $IIP_3$  measurements of our 750MHz nonmagnetic non-reciprocal integrated N-path-filter-based circulators fabricated in 65nm CMOS with new simulations based on our circuit simulation technique: (a) [11],and (b) [12].

technique is described that is simple and computationally light, does not require measurements for model fitting, and leverages the foundry-provided model for capturing second-order parasitics. The SP equations lend themselves to the analytical computation of linearity metrics for circuits such as mixer-first receivers and N-path filters, allowing for preliminary design optimization. The model has been validated through a measured Gummel test on a fabricated transistor, as well as IIP<sub>3</sub> measurements from a mixer-first receiver and several generations of non-magnetic non-reciprocal integrated N-path-filter-based circulators.

While a full-blown PSP model has hundreds of parameters and takes into account temperature and process corner variations, the employed equations includes only 15 parameters. While equations used in our work to simulate the transistor behavior in all regions of operation, the sub-circuit that uses the foundry model to capture second-order parasitics is restricted to switch-mode operation.