### Architectures and Integrated Circuits for Efficient, High-power "Digital" Transmitters for Millimeter-wave Applications

Anandaroop Chakrabarti

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences

#### COLUMBIA UNIVERSITY

2016

©2016 Anandaroop Chakrabarti All Rights Reserved

#### ABSTRACT

### Architectures and Integrated Circuits for Efficient, High-power "Digital" Transmitters for Millimeter-wave Applications

#### Anandaroop Chakrabarti

This thesis presents architectures and integrated circuits for the implementation of energy-efficient, high-power "digital" transmitters to realize high-speed long-haul links at millimeter-wave (mmWave) frequencies in nano-scale silicon-based processes. The proposed work opens up the possibility of improving the data capacity and communication range of wireless channels by using a synergistic combination of novel mmWave integrated circuit design as well as Multi-Input-Multi-Output (MIMO) signal processing techniques. Such a cross-layer approach presents a holistic solution to improve the capacity of backbone links as mobile data and voice services become ubiquitous.

Switching power amplifiers (PAs) in CMOS at mmWave frequencies are investigated as they offer the potential for higher efficiency and harness the improving performance of the CMOS device as a switch. The concept of device stacking in the context of the Class-E family of non-linear switching PAs at mmWave frequencies is explored theoretically as well as experimentally for the first time culminating in state-of-the-art unit cell prototypes. These unit cells are further used in conjunction with a novel large-scale, low-loss, lumped-element power-combiner to realize the world's first CMOS mmWave PA with watt-class output power *on-chip* and high efficiency.

The quintessential trade-off between efficiency and linearity in PAs is addressed by means of a novel linearizing architecture that simultaneously enables high output power through largescale power combining, digital amplitude modulation and high efficiency under back-off through supply-switching and linearity through dynamic load modulation. Design considerations for supplyswitched direct digital-to-mmWave amplitude converters (DACs) are discussed. The proposed DAC cell, in conjunction with large-scale power-combining, is used to demonstrate a high-power, highly linear, three-bit mmWave DAC prototype with high efficiency under back-off. Supply modulation for stacked Class-E PAs is investigated as a means of achieving high-resolution linear amplitude modulation with high average efficiency. A Class-G supply modulator-based "hybrid" 4-stacked mmWave Class-E-like power DAC is also presented for a practical implementation of high-resolution mmWave power DACs with high output power as well as high peak and average efficiencies.

A 60GHz digital polar four element phased array transmitter in CMOS with watt-level output power based on the Class-G supply-modulated hybrid power DAC is described. The prototype is the first fully integrated high-power mmWave transmitter capable of supporting complex modulations with high average efficiency. Experimental results are expected to be available in the latter half of 2016. The transmitter is envisioned to serve as a testbed for future studies of massive mmWave Multiple-Input-Multiple-Output (MIMO) systems to realize long-haul, high-speed mmWave links that can be used for wireless backhaul.

# **Table of Contents**

| $\mathbf{Li}$ | st of | Figur  | es                                                                          | v     |  |
|---------------|-------|--------|-----------------------------------------------------------------------------|-------|--|
| Li            | st of | Table  | s                                                                           | xx    |  |
| $\mathbf{A}$  | ckno  | wledge | ements                                                                      | xxii  |  |
| D             | edica | tion   |                                                                             | xxiii |  |
| 1             | Intr  | oducti | ion                                                                         | 1     |  |
|               | 1.1   | Thesis | Objectives                                                                  | . 2   |  |
|               | 1.2   | Thesis | 3 Outline                                                                   | . 3   |  |
| <b>2</b>      | Sta   | cked S | witch-mode Millimeter-wave CMOS PAs                                         | 5     |  |
|               | 2.1   | Millim | neter-wave CMOS Power Generation Challenges                                 | . 6   |  |
|               | 2.2   | Large- | -Scale Millimeter-wave Power Combining Challenges                           | . 8   |  |
|               | 2.3   | Device | e Stacking in CMOS and SOI CMOS Power Amplifiers                            | . 10  |  |
|               | 2.4   | Loss-a | ware Design Methodology for Class-E Switching PAs                           | . 11  |  |
|               |       | 2.4.1  | Circuit Model and Assumptions                                               | . 13  |  |
|               |       | 2.4.2  | Circuit Analysis                                                            | . 13  |  |
|               |       | 2.4.3  | Optimization Procedure and Comparison to Prior Works $\ldots \ldots \ldots$ | . 16  |  |
|               | 2.5   | Millim | neter-wave Stacked CMOS Class-E-like PAs                                    | . 19  |  |
|               |       | 2.5.1  | Concepts                                                                    | . 19  |  |
|               |       | 2.5.2  | Theoretical Analysis and Fundamental Limits                                 | . 21  |  |
|               |       | 2.5.3  | Interpretation using Waveform Figures of Merit                              | . 25  |  |

|   |      | 2.5.4  | Stacking vs. Power Combining                                                        | 29  |
|---|------|--------|-------------------------------------------------------------------------------------|-----|
|   |      | 2.5.5  | Stacking vs. Impedance Transformation                                               | 30  |
|   |      | 2.5.6  | IBM 45 nm SOI and 65 nm CMOS Power Device Modeling $\hdots$                         | 31  |
|   |      | 2.5.7  | Implementation Details                                                              | 35  |
|   |      | 2.5.8  | Experimental Results                                                                | 45  |
|   | 2.6  | Challe | enges Associated with Device Stacking in Class-E PAs                                | 57  |
|   | 2.7  | Multi- | output Stacked Class-E PA                                                           | 59  |
|   |      | 2.7.1  | Principle of Operation                                                              | 60  |
|   |      | 2.7.2  | Efficiency Analysis                                                                 | 62  |
|   |      | 2.7.3  | Internal Power-combining                                                            | 66  |
|   |      | 2.7.4  | Stability of Dual Output Class-E PA vs. Cascode PA                                  | 68  |
|   |      | 2.7.5  | Millimeter-wave Dual-Output Class-E-like PA Implementation $\ldots \ldots \ldots$   | 70  |
|   |      | 2.7.6  | Experimental Results                                                                | 72  |
|   | 2.8  | Millin | neter-wave Watt-class Stacked Class-E-like Switching PA Array in CMOS $\ . \ . \ .$ | 75  |
|   |      | 2.8.1  | Proposed Non-isolating Lumped Quarter-wave Combiner                                 | 76  |
|   |      | 2.8.2  | Implementation Details                                                              | 77  |
|   |      | 2.8.3  | Measurement Results                                                                 | 78  |
|   | 2.9  | Comp   | arison with State of the Art                                                        | 79  |
|   | 2.10 | Concl  | usion                                                                               | 82  |
| 3 | Stac | cked N | Iillimeter-wave Power DACs                                                          | 83  |
|   | 3.1  | Linea  | rity vs Efficiency Trade-off and Efficiency Under Back-off Challenges for Con-      |     |
|   |      | ventio | nal PAs                                                                             | 84  |
|   | 3.2  | RF D   | AC: A New Design Paradigm in Fine-line CMOS                                         | 85  |
|   | 3.3  | Prior  | Work on mmWave Power DACs                                                           | 86  |
|   | 3.4  | mmW    | ave Power DAC Based on Supply Switching, Power Combining and Load Mod-              |     |
|   |      | ulatio | n                                                                                   | 89  |
|   |      | 3.4.1  | Mixed-Signal Linearizing Architecture for mmWave Power DACs                         | 89  |
|   |      | 3.4.2  | Millimeter-wave 1-bit Stacked Power DAC                                             | 91  |
|   |      | 3.4.3  | Linearized Multi-bit mmWave Power DAC using Supply-switching and Digitally          | _   |
|   |      |        | controlled Load Modulation                                                          | 103 |

|   |     | 3.4.4   | Comparison with State of the Art                                    | 109 |
|---|-----|---------|---------------------------------------------------------------------|-----|
|   | 3.5 | Supply  | y-modulated mmWave Class-E-like Power DACs                          | 110 |
|   |     | 3.5.1   | Voltage Scaling with Supply for Stacked Class-E PAs                 | 116 |
|   |     | 3.5.2   | Supply-modulators for mmWave Power DACs                             | 118 |
|   |     | 3.5.3   | Switched-capacitor Supply-modulated Hybrid mmWave Stacked Power DAC | 120 |
|   |     | 3.5.4   | Class-G Supply Modulated Hybrid mmWave Stacked Power DAC            | 123 |
|   | 3.6 | Conclu  | usion                                                               | 126 |
| 4 | A N | Aillime | eter-wave Digital Polar Phased Array Transmitter                    | 127 |
|   | 4.1 | Overv   | iew of Transmitter Architectures                                    | 127 |
|   | 4.2 | Integr  | ated MmWave Phased Array Transmitters: An Overview                  | 129 |
|   | 4.3 | Millin  | neter-wave Link Budget Analysis                                     | 131 |
|   | 4.4 | Digita  | l Polar Transmitter Implementation                                  | 134 |
|   |     | 4.4.1   | Digital Polar Transmitter Element                                   | 138 |
|   |     | 4.4.2   | Simulation Results of Transmitter Element                           | 143 |
|   |     | 4.4.3   | Phased Array Transmitter Common Path                                | 144 |
|   |     | 4.4.4   | Delay Mismatch between Amplitude and Phase Paths                    | 145 |
|   |     | 4.4.5   | High-speed Digital Interface                                        | 145 |
|   |     | 4.4.6   | Simulation Results of Transmitter Prototype                         | 154 |
|   | 4.5 | Conclu  | usion                                                               | 155 |
| 5 | Fut | ure W   | ork                                                                 | 156 |
|   | 5.1 | Short   | Term Goal: Prototype Measurements                                   | 156 |
|   | 5.2 | Long '  | Term Vision: Massive Millimeter-wave MIMO for Wireless Backhaul     | 156 |
|   |     | 5.2.1   | MmWave Massive MIMO Link Analysis                                   | 159 |
|   |     | 5.2.2   | Impact of MIMO Array Geometry on Transceiver Metrics                | 161 |
|   |     | 5.2.3   | Demonstration of a Long-haul MIMO Link at 60GHz                     | 161 |
|   | 5.3 | Conclu  | usion                                                               | 163 |

| I Bib   | liography                                          | 165 |
|---------|----------------------------------------------------|-----|
| Bibliog | raphy                                              | 166 |
|         |                                                    |     |
| II Ap   | opendices                                          | 181 |
| A Moo   | deling of Passive Components in IBM 45 nm SOI CMOS | 182 |

# List of Figures

| 2.1 | (a) $f_{max}$ of deep-submicron CMOS technology nodes from the literature. (b) Supply                                           |    |
|-----|---------------------------------------------------------------------------------------------------------------------------------|----|
|     | voltage of deep-submicron CMOS technology nodes. Survey of the (c) saturated                                                    |    |
|     | output power and (d) PAE achieved by RF and millimeter-wave CMOS PAs prior                                                      |    |
|     | to the work described in this thesis. $\ldots$ | 7  |
| 2.2 | Conventional power combining techniques: (a) transformer-based series combining,                                                |    |
|     | (b) Wilkinson combining and (c) zero-degree combining.                                                                          | 8  |
| 2.3 | (a) Concept of series stacking in CMOS PAs with voltage swings annotated and (b)                                                |    |
|     | prior art on stacked PAs at low RF frequencies                                                                                  | 10 |
| 2.4 | Class-E PA with finite DC-feed inductance and non-zero switch on-resistance                                                     | 12 |
| 2.5 | Contour plots for (a) output power, (b) drain efficiency and (c) PAE as functions                                               |    |
|     | of $V_{on}$ and $n_0$ for Class-E PAs based on the loss-aware design methodology using                                          |    |
|     | $0.7 \mu m$ channel-length thick-oxide devices in IBM's $0.18 \mu {\rm m}$ CMOS technology. (d)                                 |    |
|     | Comparison of drain voltage waveforms for a PAE-optimal 5GHz device-based design                                                |    |
|     | vs. theoretical and switch-based simulations.                                                                                   | 17 |
| 2.6 | (a) Stacked CMOS Class-E-like PA concept with voltage swings annotated in volts                                                 |    |
|     | and (b) simplification of the stacked topology for analysis using loss-aware Class-E                                            |    |
|     | design methodology.                                                                                                             | 19 |

| 2.7  | (a) Theoretical and simulated (post-layout) output power and PAE and (b) device                                                                      |    |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------|----|
|      | size and theoretical device stress for the optimal design as a function of number of                                                                 |    |
|      | devices stacked based on the loss-aware Class-E design methodology at 45 GHz in 45 $$                                                                |    |
|      | nm SOI CMOS. Loss in dc-feed inductance is included for theoretical results. Output                                                                  |    |
|      | power and PAE for a switch+capacitor based model for the 4-stacked configuration                                                                     |    |
|      | are also annotated                                                                                                                                   | 23 |
| 2.8  | Product of waveform figures of merit $F_{I,n}^2$ and $F_{C,n}$ for stacked Class-E-like PAs                                                          |    |
|      | in 45 nm SOI CMOS at 45 GHz based on the loss-aware and ZVS based design                                                                             |    |
|      | methodologies.                                                                                                                                       | 28 |
| 2.9  | Comparison of device stacking in Class-E-like PAs (based on loss-aware Class-E de-                                                                   |    |
|      | sign methodology) at 45 GHz in 45 nm SOI CMOS with (a) 2, 4 and 8-way Wilkinson-                                                                     |    |
|      | tree-based power-combining and transformer-based series power-combining (the 2-                                                                      |    |
|      | stacked and 1-stacked Class-E-like PAs obtained from the loss-aware Class-E design                                                                   |    |
|      | methodology are used with both the N-way Wilkinson-tree-based and transformer                                                                        |    |
|      | power-combiners) and (b) impedance transformation at 45 GHz in 45 nm SOI CMOS $$                                                                     |    |
|      | (the 2-stacked and 1-stacked Class-E-like PAs obtained from the loss-aware Class-E                                                                   |    |
|      | design methodology are scaled to increase output power and a 2-element L-C net-                                                                      |    |
|      | work is used to transform the 50 $\Omega$ load to the optimal load impedance for the scaled                                                          |    |
|      | PAs. Quality factors for the inductor and capacitor are assumed to be 15 and 10 $$                                                                   |    |
|      | respectively at 45 GHz)                                                                                                                              | 29 |
| 2.10 | Close-up of (a) devices and (b) devices with connections to gate capacitors in 4-                                                                    |    |
|      | stacked PA layout implemented in 45 nm SOI CMOS                                                                                                      | 32 |
| 2.11 | Augmented schematic of 4-stacked power device in 45 nm SOI CMOS with capacitive                                                                      |    |
|      | and inductive layout parasitics                                                                                                                      | 33 |
| 2.12 | Measured (extrapolated) (a) $f_{max}$ and (b) $f_T$ of $\frac{2.793 \ \mu m \times 41}{40 \ nm}$ and $\frac{2.793 \ \mu m \times 73}{40 \ nm}$ power |    |
|      | devices in IBM 45 nm SOI CMOS across current density. These devices are used in                                                                      |    |
|      | designing the 2-stacked and 4-stacked PAs respectively. (c) Measured and simulated                                                                   |    |
|      | $f_{max}$ for a $\frac{3\ \mu m \times 50}{60\ nm}$ 65 nm low-power bulk-CMOS power device across current density.                                   | 34 |
| 2.13 | Schematics of 45 nm SOI CMOS Q-band Class-E-like PAs with (a) 2 devices stacked                                                                      |    |
|      | and (b) 4 devices stacked                                                                                                                            | 35 |

| 2.14 | Simulated voltage profiles for 2-stacked Class-E-like PA (a) without tuning inductor,                 |    |
|------|-------------------------------------------------------------------------------------------------------|----|
|      | and (b) with tuning inductor. (c) Close-up of voltage profiles with (bottom) and                      |    |
|      | without (top) tuning inductor.                                                                        | 37 |
| 2.15 | Simulated drain-source and gate-source voltage waveforms of the Q-band (a) 2-                         |    |
|      | stacked Class-E-like PA ( $V_{g1}=0.4$ V, $V_{g2}=1.7$ V, $V_{DD}=2.4$ V) and (b) 4-stacked           |    |
|      | Class-E-like PA in 45 nm SOI CMOS ( $V_{g1}$ =0.4 V, $V_{g2}$ =1.8 V, $V_{g3}$ =2.8 V, $V_{g4}$ =4 V, |    |
|      | $V_{DD}$ =4.8 V)                                                                                      | 38 |
| 2.16 | Schematic of the two-stage 45 nm SOI CMOS Q-band Class-E-like PA with a 2- $$                         |    |
|      | stacked driver stage and a 4-stacked main PA                                                          | 39 |
| 2.17 | Schematic of differential 2-stacked Class-E-like PA implemented in 65 nm low-power                    |    |
|      | bulk CMOS                                                                                             | 40 |
| 2.18 | Post-layout simulated drain-source voltages and corresponding switch currents for                     |    |
|      | (a) 2-stacked PA and (b) 4-stacked PA in 45 nm SOI CMOS                                               | 41 |
| 2.19 | Comparison of post-layout simulated waveforms for device $M_2$ of the 2-stacked PA                    |    |
|      | in 45 nm SOI CMOS with theory                                                                         | 42 |
| 2.20 | Comparison of post-layout simulated waveforms for device $M_4$ of the 4-stacked PA                    |    |
|      | in 45 nm SOI CMOS with theory                                                                         | 42 |
| 2.21 | (a) Post-layout simulated device voltages for the 4-stacked PA prototype without                      |    |
|      | tuning inductor in 45 nm SOI CMOS. (b) Simulated switch voltages for the same                         |    |
|      | 4-stacked configuration without tuning inductor, using a switch+capacitor based                       |    |
|      | model for the devices and layout parasitics from Fig. 2.11                                            | 44 |
| 2.22 | Chip microphotographs of the millimeter-wave stacked Class-E-like PAs with (a) $2$                    |    |
|      | devices stacked in 45 nm SOI CMOS, (b) 4 devices stacked in 45 nm SOI CMOS,                           |    |
|      | (c) a two-stage cascade of a main PA with 4 devices stacked with a 2-stacked driver                   |    |
|      | stage in 45 nm SOI CMOS and (d) 2 devices stacked in 65 nm low-power bulk CMOS.                       | 45 |
| 2.23 | Small signal S-parameters of 45 nm SOI 2-stacked Class-E-like PA ( $V_{g1}=0.4$ V,                    |    |
|      | $V_{g2}=1.7$ V, $V_{DD}=2.4$ V). Power consumption=49 mW                                              | 46 |
| 2.24 | Small signal S-parameters of 45 nm SOI 4-stacked Class-E-like PA ( $V_{g1}$ =0.4 V,                   |    |
|      | $V_{g2}=1.8$ V, $V_{g3}=2.8$ V, $V_{g4}=4$ V, $V_{DD}=4.8$ V). Power consumption=206 mW               | 47 |

| 2.25 | Small signal S-parameters of 45 nm SOI 4-stacked Class-E-like PA with the tuning $% \left( {{\rm S}_{\rm T}} \right)$ |    |
|------|-----------------------------------------------------------------------------------------------------------------------|----|
|      | inductor eliminated using laser trimming ( $V_{g1}=0.4$ V, $V_{g2}=1.8$ V, $V_{g3}=2.8$ V, $V_{g4}=4$                 |    |
|      | V, $V_{DD}$ =4.8 V). Power consumption=206 mW                                                                         | 49 |
| 2.26 | Small signal S-parameters (single-ended) of the 65 nm differential 2-stacked Class-E-                                 |    |
|      | like PA ( $V_{g1}=0.8$ V, $V_{g2}=2$ V, $V_{DD}=2$ V). Power consumption=89 mW under small                            |    |
|      | signal operation.                                                                                                     | 50 |
| 2.27 | Large signal Q-band measurement setup for the fabricated PAs                                                          | 51 |
| 2.28 | Measured gain, drain efficiency and PAE as a function of output power for (a) the                                     |    |
|      | 45 nm SOI 2-stacked Class-E-like PA at 47 GHz ( $V_{g1}{=}0.4$ V, $V_{g2}{=}1.7$ V, $V_{DD}{=}2.4$                    |    |
|      | V) and (b) the 65 nm differential 2-stacked Class-E-like PA at 47.5 GHz (V_{g1}=0.8                                   |    |
|      | V, $V_{g2}=2.1$ V, $V_{DD}=2.8$ V)                                                                                    | 52 |
| 2.29 | Measured gain, drain efficiency and PAE as a function of output power for (a) the                                     |    |
|      | $45~\mathrm{nm}$ SOI 4-stacked Class-E-like PA with the tuning inductor eliminated through                            |    |
|      | laser trimming at 42.5 GHz and (b) the 45 nm SOI 4-stacked Class-E-like PA at 47.5 $$                                 |    |
|      | GHz ( $V_{g1}$ =0.4 V, $V_{g2}$ =1.8 V, $V_{g3}$ =2.8 V, $V_{g4}$ =4 V, $V_{DD}$ =4.8 V for both designs)             | 53 |
| 2.30 | Measured gain, saturated output power, drain efficiency and PAE (a) across fre-                                       |    |
|      | quency ( $V_{g1}=0.4$ V, $V_{g2}=1.7$ V, $V_{DD}=2.4$ V) and (b) across supply voltage at 47                          |    |
|      | GHz of the 45 nm SOI 2-stacked Class-E-like PA ( $V_{g1}$ =0.4 V, $V_{g2}$ =1.7 V)                                    | 54 |
| 2.31 | Measured gain, saturated output power, drain efficiency and PAE (a) across fre-                                       |    |
|      | quency ( $V_{g1}=0.4$ V, $V_{g2}=1.8$ V, $V_{g3}=2.8$ V, $V_{g4}=4$ V) and (b) across supply voltage                  |    |
|      | at 47 GHz of the 45 nm SOI 4-stacked Class-E-like PA.                                                                 | 55 |
| 2.32 | Measured and expected (a) average supply current vs $V_{DD}$ and (b) saturated output                                 |    |
|      | power vs $V_{DD}^2$ for 2-stacked Class-E-like PA in 45 nm SOI CMOS. The profiles                                     |    |
|      | display the linearity with respect to supply voltage associated with switching Class-E $$                             |    |
|      | PAs, thereby establishing the Class-E-like characteristics of the PA even at mmWave                                   |    |
|      | frequencies.                                                                                                          | 56 |

| 2.33 | Measured and expected (a) average supply current vs $V_{DD}$ and (b) saturated output                    |    |
|------|----------------------------------------------------------------------------------------------------------|----|
|      | power v<br>s $V_{DD}^2$ for 4-stacked Class-E-like PA in 45 nm SOI CMOS. The profiles do                 |    |
|      | not display the linearity with respect to supply voltage characteristic of switching                     |    |
|      | Class-E PAs, owing to layout-induced increased gate resistance which prevents hard-                      |    |
|      | switching at mmWave frequencies.                                                                         | 57 |
| 2.34 | (a) Measured small signal S-parameters and (b) measured gain, drain efficiency and                       |    |
|      | $\operatorname{PAE}$ as a function of output power for the two-stage 45 nm SOI PA comprising a           |    |
|      | 4-stacked main PA and a 2-stacked driver stage at 47 GHz ( $V_{g1}$ =0.4 V, $V_{g2}$ =1.7 V,             |    |
|      | $V_{g3}=2.8$ V, $V_{g4}=4$ V, $V_{DD,1}=4.8$ V, $V_{g5}=0.4$ V, $V_{g6}=1.6$ V, $V_{DD,2}=2.4$ V). Power |    |
|      | consumption=255  mW under small signal operation                                                         | 58 |
| 2.35 | (a) Multi-output Stacked Class-E PA and (b) corresponding simplified switch-based                        |    |
|      | schematic with drain voltage swings for lossless operation                                               | 59 |
| 2.36 | (a) ON state operation and (b) OFF state operation of lossless Multi-output Stacked                      |    |
|      | Class-E PA                                                                                               | 61 |
| 2.37 | (a) Drain efficiency as a function of device sizes $W_1$ and $W_2$ and load network tuning               |    |
|      | $(L_s \times C_{out})$ for Dual-Output Class-E PA using switch-based simulations at 45GHz, (b)           |    |
|      | efficiencies from device-based simulations and (c) comparison of output powers from                      |    |
|      | switch-based and device-based simulations as a function of device size ratio $W_1/W_2$                   |    |
|      | with $W_2=100\mu$ m. (d) Variation of real and imaginary parts of load impedances for                    |    |
|      | the top and bottom devices of the Dual-Output Class-E PA as a function of device                         |    |
|      | size ratio $W_1/W_2$ , with $W_2=100\mu$ m, obtained from theoretical results at 45GHz                   |    |
|      | using body-contacted device parameters in IBM 45nm SOI CMOS $\left[1\right]$ (Note: Load                 |    |
|      | impedance for the top device shows no variation since $W_2$ remains unchanged)                           | 65 |

| 2.38 | Illustration of internal power-combining for Dual-Output Class-E PA (biasing details               |    |
|------|----------------------------------------------------------------------------------------------------|----|
|      | omitted). (a) Optimized load networks for the top and bottom devices. (b) The                      |    |
|      | phase-shifts $\phi_1$ and $\phi_2$ introduced by the impedance transformation networks $M_1$       |    |
|      | and $M_2$ respectively should ensure phase alignment at the transformed impedances                 |    |
|      | $R_A$ and $R_B$ to ensure constructive power-combining at the single output node. (c)              |    |
|      | Single load = $R_A \parallel R_B$ driven by output powers from the top and bottom devices.         |    |
|      | The single load is split between the individual load networks depending on the power               |    |
|      | levels prior to internal power-combining such that equal voltage amplitude $V_1$ is                |    |
|      | produced across $R_A$ and $R_B$                                                                    | 67 |
| 2.39 | (a) Feedback loop resulting from internal power-combining in the Dual-Output Class-                |    |
|      | E PA and (b) cascode PA where common-gate device mitigates feedback through $C_{gd}$               |    |
|      | and improves reverse isolation.                                                                    | 68 |
| 2.40 | Stability analysis for the Dual-Output Class-E PA. (a) PA without input stimulus,                  |    |
|      | (b) simplified circuit for small-signal analysis, with the input device replaced by its            |    |
|      | output capacitance $C_{out,1}$ and output resistance $R_{out,1}$ and the top device modeled        |    |
|      | by its transconductance $(g_m)$ , output capacitance $C_{out,2}$ and output resistance $R_{out,2}$ |    |
|      | and (c) equivalent circuit for calculation of loop gain                                            | 69 |
| 2.41 | (a) Dual-Output Class-E PA unit cell schematic (left) and impedance transforma-                    |    |
|      | tion networks used for internally power-combining the output power available from                  |    |
|      | top and bottom devices (right). Impedance levels at pertinent nodes are anno-                      |    |
|      | tated. (b) Drain-source voltage and current waveforms for top and bottom de-                       |    |
|      | vices exhibiting non-overlapping characteristics confirming Class-E-like operation                 |    |
|      | $(V_{gate,bot}=0.6V, V_{gate,top}=1.8V, V_{DD,bot}=1.3V, V_{DD,top}=2.8V).$                        | 70 |
| 2.42 | (a) Current-combined Dual-Output Class-E PA schematic. (b) Impedance trans-                        |    |
|      | formation networks used for internally power-combining the output power available                  |    |
|      | from top and bottom devices. Impedance levels at pertinent nodes are annotated                     | 72 |
| 2.43 | Chip microphotograph of (a) Dual-Output Stacked Class-E PA unit cell and (b)                       |    |
|      | two-way current-combined Dual-Output Class-E PA                                                    | 72 |

- 2.45 (a) Small signal  $\mu$  stability factors of Dual-Output Class-E PA unit cell and the current-combined prototype, calculated using measured small signal S-parameters shown in Figs. 2.44(a), 2.44(b), 2.44(c) and 2.44(d) respectively. The stability factor is >1 over the measured frequency range, indicating unconditional stability. (b) Large signal performance of the Dual-Output stacked Class-E PA unit cell at 47.5GHz ( $V_{gate,bot}=0.6V, V_{gate,top}=1.8V, V_{DD,bot}=1.3V, V_{DD,top}=2.8V$ ). (c) Large signal performance of current-combined Dual-Output Class-E PA at 47.5GHz  $(V_{gate,bot}=0.5V, V_{gate,top}=1.9V, V_{DD,bot}=1.4V, V_{DD,top}=2.9V).$ 752.46 An *n*-way spiral-based lumped quarter-wave combiner with design equations. . . . 762.47 Schematic of the 33-46 GHz watt-class PA array prototype. . . . . . . . . . . . . . 77 2.48 Chip microphotograph of the 33-46 GHz watt-class PA array prototype. Chip di-782.49 (a) Simulated and measured small-signal S-parameters of the watt-class power-combined PA array. Measured results showing (b) large-signal saturated output power, peak PAE and drain efficiency at peak PAE across frequency. (c) Gain, PAE and drain efficiency across output power levels for three frequencies. (d) Results of a preliminary 793.1(a) Linearity vs. efficiency trade-off in conventional PAs and (b) efficiency under back-off profile of conventional transmitters when supporting complex modulations. 84 Direct modulator topologies for mmWave power DACs [2]: (a) Double-balanced 3.2baseband-DAC topology and (b) double-balanced RF-DAC topology. 86
- 3.3 (a) Schematic of the mmWave 9-bit Power DAC presented in [2]. Simulated (b) output power and output amplitude vs digital amplitude word and (c) drain efficiency and PAE vs. output power of the 9-bit DAC at 90GHz in IBM 45nm SOI CMOS.
  87

| 3.5  | Schematic and chip microphotograph of the implemented 1-bit mmWave $47 \text{GHz}$                             |     |
|------|----------------------------------------------------------------------------------------------------------------|-----|
|      | Class-E-like SOI CMOS power DAC                                                                                | 90  |
| 3.6  | Schematics of the proposed 1-bit mmWave power DAC with (a) mmWave path and                                     |     |
|      | (b) digital path highlighted                                                                                   | 91  |
| 3.7  | Simplified schematic illustrating ON state operation of the proposed 1-bit mmWave                              |     |
|      | power DAC                                                                                                      | 92  |
| 3.8  | Simplified schematic illustrating OFF state operation of the proposed 1-bit mmWave $\$                         |     |
|      | power DAC                                                                                                      | 93  |
| 3.9  | Considerations for bias-path RC time-constant to support Gbps modulation in the                                |     |
|      | 1-bit Power DAC cell.                                                                                          | 94  |
| 3.10 | Incorporation of large decoupling capacitors into input and output impedance trans-                            |     |
|      | formation networks to minimize settling time of DAC cell                                                       | 95  |
| 3.11 | Impact of bias-path RC time constant on settling time of 1-bit Power DAC cell for (a)                          |     |
|      | 500 $\Omega$ and (b) 1.5K $\Omega$ biasing resistor for input device M <sub>6</sub> of output stage (simulated |     |
|      | using a 100MHz 50% duty-cycle clock).                                                                          | 95  |
| 3.12 | Average DC power consumption and average drain efficiency (DE) of the supply-                                  |     |
|      | switched DAC cell as a function of total supply bypass capacitance in the main PA                              |     |
|      | $(C_{bypass,1})$ for a 500MHz clock input with 50% duty-cycle.                                                 | 96  |
| 3.13 | (a) Slow rise time of the supply-switch control $(V_{ctrl})$ in the presence of extra digital                  |     |
|      | path interconnect, (b) the resulting spike in drain current $I_{DS}$ of the main PA in the                     |     |
|      | DAC cell, and (c) circuit illustration of the mechanism.                                                       | 98  |
| 3.14 | Simulations illustrating bounce in the supplies of (a) the output stage (main PA) and                          |     |
|      | (b) driver stage of the 1-bit mmWave power DAC with 20pF bypass capacitances                                   |     |
|      | and without them. Simulations comparing bounce in the on-chip (c) RF ground                                    |     |
|      | and (d) digital ground of the power DAC in the presence and absence of 20pF                                    |     |
|      | by<br>pass capacitances. Supply and ground wirebond inductances of<br>$500\mathrm{pH}$ and $300\mathrm{pH}$    |     |
|      | respectively were used                                                                                         | 99  |
| 3.15 | (a) Large change in current at the switching instant of the 1-bit power DAC. (b) The                           |     |
|      | resulting ground bounce (arising from ground wirebond inductance) causes ringing                               |     |
|      | in the waveforms affecting the settling time and hence modulation speed of the PA                              | 100 |

| 3.16                                             | (a) Simulation setup for input-side modulation of the mmWave 1-bit power DAC cell.                                                                                                                         |             |
|--------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|
|                                                  | Simulated average large-signal performance at 45GHz with (b) supply-switching and                                                                                                                          |             |
|                                                  | (c) input-side modulation, both using a clock input with $50\%$ duty-cycle 10                                                                                                                              | 1           |
| 3.17                                             | Small-signal S-parameter measurements of the 1-bit digital-to-mmWave Power DAC                                                                                                                             |             |
|                                                  | prototype                                                                                                                                                                                                  | 3           |
| 3.18                                             | Setup used for large-signal measurements of the 1-bit digital-to-mmWave Power DAC                                                                                                                          |             |
|                                                  | prototype                                                                                                                                                                                                  | 4           |
| 3.19                                             | (a) Measured and simulated large-signal continuous-wave performance of the mmWave                                                                                                                          |             |
|                                                  | 1-bit power DAC cell at 47GHz, and (b) measured average large-signal metrics at                                                                                                                            |             |
|                                                  | 47GHz with $2^7$ -1 PRBS OOK at different speeds                                                                                                                                                           | 4           |
| 3.20                                             | (a) Measured DAC cell time-domain output with 1Gbps 2 <sup>7</sup> -1 PRBS OOK input                                                                                                                       |             |
|                                                  | and 47GHz carrier. (b) Measured DAC cell output spectrum with the 1Gbps OOK                                                                                                                                |             |
|                                                  | modulation                                                                                                                                                                                                 | 5           |
| 3.21                                             | Zoomed-in view of the measured DAC cell time-domain output showing (a) rise time                                                                                                                           |             |
|                                                  | and (b) fall time with 1Gbps $2^7$ -1 PRBS OOK input and 47GHz carrier. Setup losses                                                                                                                       |             |
|                                                  | have not been de-embedded                                                                                                                                                                                  | 5           |
| 3.22                                             | Schematic of the 42.5GHz digitally-controlled quarter-wave-load-modulated switch-                                                                                                                          |             |
|                                                  | ing PA array. The state of the array for $n = 8, m = 5, R_l = 25\Omega$ is shown for                                                                                                                       |             |
|                                                  | illustration. The red digital paths are for the OFF PAs (shaded in grey) while the                                                                                                                         |             |
|                                                  | green digital paths are for the ON PAs                                                                                                                                                                     | 6           |
| 3.23                                             |                                                                                                                                                                                                            |             |
|                                                  | Chip photograph of the Q-band $$ three-bit digital to mmWave PA array prototype 10                                                                                                                         | 8           |
| 3.24                                             | Chip photograph of the Q-band three-bit digital to mmWave PA array prototype 10<br>Measured small-signal S-parameters vs. digital control setting of the three-bit digital                                 | 8           |
| 3.24                                             | Chip photograph of the Q-band three-bit digital to mmWave PA array prototype 10<br>Measured small-signal S-parameters vs. digital control setting of the three-bit digital<br>to mmWave PA array prototype | 9           |
| 3.24<br>3.25                                     | Chip photograph of the Q-band three-bit digital to mmWave PA array prototype 10<br>Measured small-signal S-parameters vs. digital control setting of the three-bit digital<br>to mmWave PA array prototype | 18          |
| 3.24<br>3.25                                     | Chip photograph of the Q-band three-bit digital to mmWave PA array prototype 10<br>Measured small-signal S-parameters vs. digital control setting of the three-bit digital<br>to mmWave PA array prototype | 18          |
| 3.24<br>3.25                                     | Chip photograph of the Q-band three-bit digital to mmWave PA array prototype 10<br>Measured small-signal S-parameters vs. digital control setting of the three-bit digital<br>to mmWave PA array prototype | 19          |
| 3.24<br>3.25                                     | Chip photograph of the Q-band three-bit digital to mmWave PA array prototype 10<br>Measured small-signal S-parameters vs. digital control setting of the three-bit digital<br>to mmWave PA array prototype | 08<br>19    |
| <ul><li>3.24</li><li>3.25</li><li>3.26</li></ul> | Chip photograph of the Q-band three-bit digital to mmWave PA array prototype 10<br>Measured small-signal S-parameters vs. digital control setting of the three-bit digital<br>to mmWave PA array prototype | 08<br>19    |
| <ul><li>3.24</li><li>3.25</li><li>3.26</li></ul> | Chip photograph of the Q-band three-bit digital to mmWave PA array prototype 10<br>Measured small-signal S-parameters vs. digital control setting of the three-bit digital<br>to mmWave PA array prototype | 9<br>9<br>0 |

| 3.27 | Measured DNL and INL of the three-bit digital to mmWave PA array prototype at                              |
|------|------------------------------------------------------------------------------------------------------------|
|      | 42.5 GHz                                                                                                   |
| 3.28 | Phase modulator resolution required to satisfy IEEE 802.11ad spectral mask when                            |
|      | the proposed mmWave 3-bit power DAC prototype is incorporated into a transmitter.111 $$                    |
| 3.29 | Simulated (a) output amplitude and supply impedance (defined as $\frac{V_{DD}}{I_{dc}}$ , where $I_{dc}$   |
|      | is the supply current drawn under large signal operation) and (b) drain efficiency                         |
|      | and PAE of the 2-stacked Class-E-like PA in IBM 45nm SOI CMOS (Fig. $2.13(\mathrm{a}))$                    |
|      | as the supply voltage is varied                                                                            |
| 3.30 | (a) Schematic of the 4-stacked Class-E-like PA in IBM 45nm SOI CMOS discussed                              |
|      | in Chapter 2. Simulated (b) drain efficiency and PAE and (c) drain-gate stress of $M_3$                    |
|      | and $M_4$ as the supply voltage is varied while keeping gate biases fixed at $V_{g1}=0.45$ V,              |
|      | $V_{g2} = 1.65 \text{V}, V_{g3} = 2.85 \text{V}, V_{g4} = 4.05 \text{V}.$                                  |
| 3.31 | (a) Schematic of the 4-stacked Class-E-like PA in IBM 45nm SOI CMOS discussed                              |
|      | in Chapter 2 with resistive divider DC biasing. Simulated (b) output amplitude                             |
|      | and supply impedance, (c) drain efficiency and PAE as the supply voltage is varied.                        |
|      | Nominally, $V_{g1}=0.45$ V, $V_{g2}=1.65$ V, $V_{g3}=2.85$ V, $V_{g4}=4.05$ V for $V_{DD}=4.8$ V. (c) Com- |
|      | parison of desired (equal device stress) and actual drain-source voltages of $M_1$ and                     |
|      | $M_4$ for $V_{DD} = 1.2$ V                                                                                 |
| 3.32 | Voltage swings at various nodes of stacked devices in Class-E PAs as a function of                         |
|      | the supply voltage                                                                                         |
| 3.33 | Supply-adaptive biasing circuit for providing DC bias of gate nodes in stacked Class-                      |
|      | E PAs                                                                                                      |
| 3.34 | Comparison of large signal metrics for resistive divider bias and proposed supply-                         |
|      | adaptive bias for stacked Class-E PAs                                                                      |
| 3.35 | Architecture of a switched-capacitor supply-modulated mmWave power DAC with                                |
|      | high amplitude resolution to support complex modulations. The amplitude control                            |
|      | word operates at a symbol rate $f_s$                                                                       |
|      |                                                                                                            |

- 3.36 (a) Schematic of a 4-stacked Class-E like PA with two-level i.e. 1-bit supply modulation and 7-bit binary-weighted tail transistor modulation. Comparison of simulated
  (b) drain efficiency and (c) PAE under back-off (as fractions of respective peak values) of the proposed topology at 60GHz in IBM 45nm SOI CMOS with a case where back-off is exercised solely through tail transistor switching. (Note: DC biases for gates of the 4-stacked PA are generated by a supply-adaptive biasing circuit.) . . . 122
- 3.38 (a) Schematic of the proposed Class-G supply modulated "hybrid" 8-bit mmWave stacked power DAC with a 4-stacked Class-E like output stage and a 2-stacked Class-E-like driver. One bit supply modulation is achieved using an inverter toggling between 2.4V and 4.8V supplies while 7-bit binary-weighted source degeneration transistors provide additional amplitude resolution. Simulated (pre-layout) (b) drain efficiency and PAE vs. output power and (c) output amplitude and normalized AM-PM distortion vs amplitude control word at 60GHz in IBM 45nm SOI CMOS (Note: DC biases for gates of the 4-stacked PA are generated by a supply-adaptive biasing circuit.)
- 3.39 (a) A direct digital-to-mmWave transmitter architecture suitable for scaling-friendly DSP intensive communication (based on the proposed high resolution Class-G "hybrid" mmWave stacked power DAC) with simultaneous large-scale power combining, (b) higher back-off efficiency compared to conventional transmitters and (c) linear output profile and no AM-PM distortion.

| 4.2 | Phased array transceiver with N elements in the transmitter as well as receiver arrays                                                                      |
|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
|     | for link budget analysis                                                                                                                                    |
| 4.3 | (a) Trend for output power vs. frequency of state-of-the-art stacked SOI CMOS                                                                               |
|     | mmWave PAs determined from [3–5], (b) trend for Noise Figure (NF) of state-of-                                                                              |
|     | the-art mmWave receivers across frequency, (c) path loss (sum of Friis propagation                                                                          |
|     | loss and atmospheric absorption) for a 100m link across frequency and (d) array size                                                                        |
|     | for a phased array transmitter with output power trend shown in (a) for a 100m                                                                              |
|     | link. Typical on-PCB antenna gain of 5dBi, 64QAM modulation with a bandwidth                                                                                |
|     | of 1GHz and an SNR of 20dB at the receiver as well as $10$ dB link margin are assumed                                                                       |
|     | for the calculation. $\ldots \ldots \ldots$ |
| 4.4 | Impact of (a) amplitude and (b) phase resolution on EVM for non-pulse-shaped                                                                                |
|     | 1 Gsps single-carrier (60 GHz) 64 QAM data in a digital polar transmitter. The various                                                                      |
|     | transceiver metrics are the same as those assumed in the link budget analysis in                                                                            |
|     | Section 4.3 except a peak transmitter output power of 25dBm                                                                                                 |
| 4.5 | Output PSD of a 60GHz digital polar transmitter vs. symbol-rate using unshaped                                                                              |
|     | 64QAM symbols. Amplitude and phase resolution are eight bits each. The maxi-                                                                                |
|     | mum symbol-rate that satisfies the spectral mask without baseband pulse-shaping is                                                                          |
|     | $\approx 400 \text{Msps.}$                                                                                                                                  |
| 4.6 | Output PSD of a 60GHz digital polar transmitter vs. symbol-rate using 64QAM                                                                                 |
|     | symbols. The baseband symbols are shaped using a tenth order low-pass digital                                                                               |
|     | FIR filter with a cutoff frequency of 150MHz and an oversampling factor of eight.                                                                           |
|     | Amplitude and phase resolution are eight bits each. The maximum symbol-rate that                                                                            |
|     | satisfies the spectral mask with baseband pulse-shaping is $\approx 1000$ Msps                                                                              |
| 4.7 | Simplified architecture of the stacked hybrid power DAC-based digital polar phased                                                                          |
|     | array mmWave transmitter proposed in this work along with schematics of the trans-                                                                          |
|     | mitter element (bottom) and the high-speed digital interface (right)                                                                                        |
| 4.8 | Schematic with detailed block-level interconnections of a transmitter element in the                                                                        |
|     | proposed digital polar phased array transmitter with major building blocks highlighted.138                                                                  |
| 4.9 | Layout snapshot of a transmitter element in the proposed digital polar phased array                                                                         |
|     | transmitter with major building blocks highlighted                                                                                                          |

| 4.10 | Simulated large signal performance metrics at 60 GHz in IBM 45nm SOI CMOS of                       |
|------|----------------------------------------------------------------------------------------------------|
|      | the (a) differential 4-stacked PA, (b) differential 2-stacked driver PA and (c) limiting           |
|      | amplifier of a transmitter element in the proposed digital polar 4-element phased                  |
|      | array transmitter prototype                                                                        |
| 4.11 | (a) Schematic of the vector $(I/Q)$ interpolator used in the phase shifter of a trans-             |
|      | mitter element in the proposed digital polar 4-element phased array transmitter                    |
|      | prototype. Simulation results at 60 GHz in IBM 45nm SOI CMOS for (b) maximum                       |
|      | phase shift error and (c) corresponding variation in output power of the phase shifter             |
|      | for different phase shift settings                                                                 |
| 4.12 | Simulation results at 60 GHz in IBM 45nm SOI CMOS for (a) output power, drain                      |
|      | efficiency and PAE and (b) maximum phase shift error of the transmitter element in                 |
|      | the proposed digital polar 4-element phased array transmitter prototype for different              |
|      | phase shift settings                                                                               |
| 4.13 | Output PSD of a 60GHz digital polar transmitter vs. delay mismatch between                         |
|      | amplitude and phase paths for 1Gsps 64QAM baseband symbols. The baseband                           |
|      | symbols are shaped using a tenth order low-pass digital FIR filter with a cutoff                   |
|      | frequency of 150MHz and an oversampling factor of eight. Amplitude and phase                       |
|      | resolution are eight bits each. The maximum delay mismatch that satisfies the                      |
|      | spectral mask is $\approx 10$ ps                                                                   |
| 4.14 | Settling behavior of the $60 \mathrm{GHz}$ digital polar transmitter in response to supply switch- |
|      | ing using a square-wave control input (with 50% duty-cycle) at 1GHz. The settling                  |
|      | time is $\approx 50$ ps                                                                            |
| 4.15 | Settling behavior of the $60 \mathrm{GHz}$ digital polar transmitter in response to phase mod-     |
|      | ulation using a square-wave control inputs (with $50\%$ duty-cycle) at 1GHz. The                   |
|      | phase modulator switches between phase settings of $11^0$ and $45^0$ . The settling time           |
|      | is $\approx 120$ ps                                                                                |
| 4.16 | Schematic of the VGA used in the AFE of the high-speed digital interface for the                   |
|      | proposed 60GHz digital polar 4-element phased array transmitter                                    |

| 4.17 | Simulated small signal voltage gain and return loss $(S_{11})$ for (a) maximum gain                                                                                                |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|      | setting and (b) minimum gain setting of the VGA used in the AFE of the high-                                                                                                       |
|      | speed digital interface for the proposed 60GHz digital polar 4-element phased array                                                                                                |
|      | transmitter                                                                                                                                                                        |
| 4.18 | Schematic of the CTLE used in the AFE of the high-speed digital interface for the                                                                                                  |
|      | proposed 60GHz digital polar 4-element phased array transmitter                                                                                                                    |
| 4.19 | Simulated small signal voltage gain for (a) no gain peaking, (b) maximum gain                                                                                                      |
|      | peaking and (c) programmable gain peaking of the CTLE used in the AFE of the                                                                                                       |
|      | high-speed digital interface for the proposed 60GHz digital polar 4-element phased                                                                                                 |
|      | array transmitter                                                                                                                                                                  |
| 4.20 | Simulated output waveforms of the 1:8 DeMUX using a 20Gbps PRBS input. The                                                                                                         |
|      | input sequence has been offset for clarity. $\ldots \ldots \ldots$ |
| 4.21 | Layout snapshot of the proposed digital polar phased array transmitter with major                                                                                                  |
|      | building blocks highlighted                                                                                                                                                        |
| 51   | (a) Current fiber-optic based infrastructure for mobile backhaul and (b) alterna-                                                                                                  |
| 0.1  | tive MIMO-based approach for wireless backhaul using compact, massively scalable                                                                                                   |
|      | arrays at mmWave                                                                                                                                                                   |
| 5.2  | Proposed massive mmWave MIMO architecture using high-power high-efficiency                                                                                                         |
| 0    | stacked mmWave power DAC transmitter together with novel digital signal process-                                                                                                   |
|      | ing schemes for channel estimation and low-PAPR transmitter precoding to facilitate                                                                                                |
|      | high-speed links.                                                                                                                                                                  |
| 5.3  | Survey across frequency of CMOS transmitter saturated output power (extrapolated                                                                                                   |
|      | from works discussed in this thesis and those reported in [4,5,28,127,128]) and NF                                                                                                 |
|      | of state-of-the-art CMOS receivers, and single-element path loss for 100m distance                                                                                                 |
|      | including atmospheric absorption are shown. The resultant number of MIMO ele-                                                                                                      |
|      | ments and corresponding sub-array size, PAPR at the MIMO transmitter elements                                                                                                      |
|      | and data rate for our proposed massive MIMO system are also depicted                                                                                                               |
| 5.4  | The number of MIMO elements and corresponding sub-array size, PAPR at the                                                                                                          |
|      | MIMO transmitter elements and data rate for the proposed massive MIMO system                                                                                                       |
|      | when the MIMO elements are arranged in a circular array                                                                                                                            |

- 5.6 Proposed demonstration of a 16Gbps 4×4 mmWave MIMO link at 60GHz over 100m distance. Each MIMO transmit element is realized using the hybrid power DAC-based digital polar four element phased array transmitter discussed in Chapter 4. . . 163

# List of Tables

| 2.1 | Comparison of the generalized loss-aware Class-E analysis with prior techniques for                                                             |
|-----|-------------------------------------------------------------------------------------------------------------------------------------------------|
|     | 5GHz Class E PAs                                                                                                                                |
| 2.2 | Normalized device parameters used in loss-aware Class-E analysis                                                                                |
| 2.3 | Conduction loss and capacitive discharge loss for the optimal designs at different                                                              |
|     | levels of stacking described in Fig. 2.7. Values for the waveform figures of merit $F_I^2$                                                      |
|     | and $F_C$ for loss-aware and ZVS based designs are also tabulated                                                                               |
| 2.4 | Comparison of Fabricated PAs with State-of-the-art CMOS & SiGe mmWave PAs $~.~~80$                                                              |
| 3.1 | Comparison with State-of-the-Art mmWave PAs with $P_{sat} > 20$ dBm or employing<br>efficiency enhancing architectures                          |
| 4.1 | Comparison with State-of-the-Art mmWave PAs with $P_{sat} > 20$ dBm or employing<br>efficiency enhancing architectures                          |
| 5.1 | Comparison of the proposed demonstration of a $4 \times 4$ mmWave MIMO link with contemporary high-speed mmWave links (academic and commercial) |

## Acknowledgments

As I gather my thoughts to conjure a mental list of all the people that have contributed to my doctoral studies, I realize that my original intention of keeping the acknowledgments section a short one is going to be futile. There are so many people who have contributed to my time as a doctoral student, whose friendship and assistance I have been fortunate to enjoy, in the absence of which my journey would not have been successful. It would be unfair if I did not convey my gratitude and appreciation to all these individuals, so here goes!

First and foremost, I would like to thank my parents. Their moral values, unconditional love and sacrifice have shaped me into the person I am and their unflinching support has seen me through days when I felt there was no light at the end of the tunnel. They have supported me in all my decisions and encouraged me to pursue my dreams, no matter how difficult the journey. My little brother has always been a pillar of strength and moral support, especially during my time away from home when I couldn't be there for my family. He is not just my sibling, but also the best friend I could ask for. I indeed feel blessed to have such a family. So, from the bottom of my heart, thank you Ma, Baba and my dear little brother Bunty!

My journey would not be successful without the help and support of my adviser, Prof. Harish Krishnaswamy. As a part of his first batch of students, I have seen him work extremely hard alongside us. He was always there for us, especially during the formative years, when we were coming to terms with the reality of doctoral studies while simultaneously building up the resources of the lab from scratch and trying to gain recognition in the fiercely competitive community. I have learnt a lot from him and hope to continue the legacy of our research group in my future career.

I am indebted to all my colleagues in the group for their help and support over the years. However, two of my dear colleagues, Ritesh Bhat and Jahnavi Sharma, deserve special mention. We started out together, built up the lab space from a junkyard into most of its current wellorganized state, and most importantly, endured the pain of stressful tapeouts and measurements during the formative years together. I will cherish the time we spent together, the chats that lasted for hours, the laughs that we shared and the technical discussions that I learnt so much from. I would like to thank my colleague Tolga Dinc for his invaluable help with my projects over the last few years. I will certainly miss the mindless discussions, pranks and hilarious conversations I shared with Linxiao Zhang and Jeffrey Chuang. I am also grateful to my friends in other research groups at Columbia, especially Kevin Tien and Josh Kim for patiently helping me out whenever I was stuck with technical issues.

Those who know me certainly understand that working out is an important part of my life. A special thanks to my friends from the gym, Daniel Adler and Vincent Dinescu, for showing me the path to a healthy lifestyle that also served as an outlet for the anguish of doctoral studies. Last year, I found a new passion in the form of Krav Maga and Brazilian Jiu Jitsu, so I would like to thank my instructors and training partners for the great time that I had which helped me stay composed in what was one of the hardest times of my doctoral career.

Last, but not the least, I would like to extend my heartfelt gratitude to my family in the U.S.A: my granduncle, grandaunt and cousins. Your love and support was invaluable in making my life away from home enjoyable.

To my parents and my little brother

## Chapter 1

## Introduction

Technology scaling in silicon process in conjunction with the feasibility of low-cost bulk-production (particularly in CMOS) makes it an attractive choice for wireless applications. The ever-increasing demand for low-cost portable devices and systems necessitates highly integrated wireless transceivers, which can practically be realized only in deeply-scaled silicon technologies. In view of the over-whelming digital content of a mobile platform, it is desirable that the architectures used to implement the RF components facilitate seamless scaling into advanced technology nodes.

Integrated millimeter-wave (mmWave) systems thus constitute a logical step to extend the advantages of silicon-based integration to higher frequencies in order to leverage the larger spectrum bandwidth for future broadband cellular communication networks. Silicon integration at mmWave frequencies also enables the realization of complex architectures [6,7] that are customized for particular mmWave communication and sensing applications. Satellite communication and 60 GHz Wireless HD are examples of mmWave high data rate communication applications. Radar-assisted driving on the other hand is a sensing application. Vehicular radar applications typically favor the 24 GHz and 77 GHz unlicensed bands. Radar transceivers, accompanied by signal processing, have made a crucial impact on automotive system design resulting in the evolution of Advanced Driver-Assist Systems (ADAS). Radar being weather-independent is the technology of choice for detecting and avoiding collision since radar systems facilitate both long-range forward-looking as well as short-range  $360^0$  risk assessment.

Another important application of mmWave technology is in advanced imaging systems. Atmospheric absorption profile has openings at 35 GHz, 94 GHz, 140 GHz and 220 GHz and the exact choice of frequency depends on specific application. In particular, mmWave imaging can be utilized for homeland security (as in airports), aeronautics (for safe operations in bad weather) and medical diagnostics. The application of mmWave technology to medical imaging devices for screening, diagnosis and treatment of diseases is particularly exciting since conventional modalities are cost prohibitive and are mostly limited to hospitals and health-care facilities. Recent studies suggest that the mmwave regime could also be used to supplement the teeming 700 MHz-2.6 GHz radio spectrum for cellular communications [8]. The high integration capability of scaled CMOS technology facilitates complex digital signal processing to realize mixed-signal Systems-on-Chip (SoCs) at mmWave that can provide the pathway to a plethora of exciting applications for future wireless networks.

However, a major drawback of migrating to deeply scaled technologies is the limited breakdown voltage as well as poor quality of on-chip passives, which form the bottleneck in efficient power generation at mmWave. These, in conjunction with the high path loss at these frequencies, have typically limited the deployment of mmWave transceivers to short-range links. However, burgeoning long-range applications such as satellite communication in the 45 GHz band and high data-rate wireless backhaul in the 71-76 GHz and 81-86 GHz bands have innervated research efforts for the development of high-power, high-efficiency PAs in CMOS. A second major challenge in the realization of energy-efficient transmitters arises from the trade-off between efficiency and linearity in conventional PAs. Additionally, the transmitter must retain high efficiency when operating at backed-off power levels to support complex modulations with finite peak-to-average-power-ratios (PAPR).

#### 1.1 Thesis Objectives

The goal of this dissertation is to present novel techniques for realizing of digital-intensive mmWave transmitters in fine-line CMOS processes. Such "digital" transmitters harness the high integration capability of nanoscale CMOS along with DSP-intensive mmWave architectural innovations to realize high-power mmWave transmitters that can facilitate high-speed long haul links supporting complex modulations with high average efficiency. From a technical standpoint, the contributions of this thesis include detailed analysis of the architecture, implementation trade-offs as well as

experimental validation along with prospects of utilizing the proposed research to address the needs of future wireless systems.

#### 1.2 Thesis Outline

This dissertation is organized as follows: Chapter 2 describes the challenges associated with realizing energy-efficient, high-power amplifiers in silicon at mmWave and introduces the concept of series device stacking for switch-mode PAs to overcome the fundamental limitations of efficient mmWave power generation. A loss-aware design methodology for Class-E PAs, suitable for mmWave implementations is presented. This is followed by detailed theoretical analysis, simulation and experimental results pertaining to state-of-the-art stacked Class-E-like mmWave PAs implemented in 45nm SOI CMOS. A novel Multi-output Class-E PA topology is proposed for overcoming the fundamental challenges associated with voltage swings in stacked Class-E PAs. The chapter concludes with a description of an ultra wideband Class-E-like PA array in Q-band that achieves watt-level output power on-chip by using large-scale, low-loss power-combining of the stacked Class-E PAs.

Chapter 3 discusses the challenges associated with realizing high-power mmWave transmitters that can support complex modulations with high average efficiency. The concept of a digitalintensive direct digital-to-mmWave amplitude converter (DAC) using high-power DAC unit cells is presented along with a summary of prior works on mmWave power DACs. This is followed by the architectural overview, implementation trade-offs and experimental results for a 45GHz 1-bit Class-E-like mmWave power DAC. A linearizing architecture for mmWave power DACs is discussed that simultaneously employs large-scale power combining for high output power, PA supply switching for high efficiency under back-off, and digitally-controlled load-modulation for linearization of switching PAs. A novel approach for realizing high resolution mmWave power DACs with high back-off efficiency, by combining Class-G supply modulation with tail transistor switching is also presented.

Chapter 4 describes detailed design considerations of a digital polar phased array transmitter utilizing the foregoing concepts. The transmitter also incorporates an on-chip deserializing receiver for high-speed operation of the digital control bits to achieve high data-rates. The proposed transmitter architecture thereby facilitates scaling-friendly DSP-intensive communication. Chapter 5 concludes the dissertation with an overview of the intellectual contributions. Plans regarding experimental verification of the proposed transmitter prototype are discussed along with the details of demonstrating a 60GHz  $4 \times 4$  MIMO link using the transmitter prototype. The chapter also proposes a long term research vision of leveraging the prototype to realize massively scalable mmWave MIMO arrays with unprecedented data-rates for future wireless networks.

## Chapter 2

# Stacked Switch-mode Millimeter-wave CMOS PAs

Series stacking of multiple devices is a promising technique that can help overcome some of the fundamental limitations of CMOS technology in order to improve the output power and efficiency of CMOS power amplifiers (PAs), particularly at millimeter-wave (mmWave) frequencies. The concept of device stacking has been explored at RF frequencies in the context of quasi-linear PAs (Class-A, Class-AB) in both III-V compound semiconductor technologies as well as CMOS [9–13]. This chapter investigates the concept of device stacking in the context of the Class-E family of non-linear switching PAs at mmWave frequencies to realize watt-class output power on-chip with high efficiency.

Sections 2.1 and 2.2 describe the challenges associated with power generation and large-scale power-combining at mmWave. The concept of series device stacking in PAs is discussed in Section 2.3 along with an overview of prior works on linear/quasi-linear stacked PAs. The challenges associated with implementing switching PAs at mmWave frequencies is discussed in Section 2.4 and a loss-aware Class-E design methodology suitable for mmWave operation is presented. Fundamental limits on achievable performance of stacked mmWave Class-E-like PAs are derived and design guidelines are provided for a practical implementation in Section 2.5. The theoretical results are further explained by means of an analysis that identifies technology-dependent metrics governing the performance of stacked Class-E-like CMOS PAs. Specifically, the analysis introduces a technology constant, referred to as the Switch Time Constant, which is an important technology metric for switching PAs in addition to  $f_{max}$ . A comparison with conventional techniques of impedance transformation and power-combining is also included to demonstrate the benefits of series stacking. Section 2.5.6 describes the layout and modeling of stacked power devices. Sections 2.5.7 and 2.5.8 respectively discuss the implementation details and measurement results of CMOS prototypes at 45GHz that achieve saturated output power levels of 17-20 dBm at power-added efficiencies as high as 35%, a record for CMOS mmWave PAs.

Section 2.6 discusses the challenges associated with realizing appropriate voltage swings at intermediary nodes in a stacked Class-E PA for true Class-E behavior of the stack. A novel topology, referred to as the Multi-output Stacked Class-E PA, is described which addresses this issue by utilizing an explicit "Class-E load network" at each intermediary node. A unique feature of this scheme is that output power is available at all the intermediary nodes. Section 2.7.3 proposes a means of "internally" power-combining all the output nodes. The potential instability resulting from the internal power-combining is discussed in Section 2.7.4 along with guidelines to mitigate the issue. Sections 2.7.5 and 2.7.6 respectively discuss the implementation details and measurement results of two Q-band prototypes fabricated in IBM's 45nm Silicon-on-Insulator (SOI) CMOS technology that achieve saturated output power levels of 17-19 dBm at power-added efficiencies of 16-25%. The performance is inferior to the stacked 45GHz PAs in Section 2.5.8 owing to the use of the body-contacted devices in the technology which exhibit lower speed.

Finally, Section 2.8 describes the first mmWave PA in CMOS with watt-level output power which is realized by combining the efficient, stacked Class-E-like unit cell PAs described in Section 2.5 with a novel lumped-element power-combiner enabling one-step, large-scale, low-loss power-combining at mmWave [14]. Eight-way combining of stacked 45nm SOI CMOS PAs results in a PA array with watt-class (>27 dBm) saturated output power ( $3 \times$  higher than prior art) and ultra-wideband operation (33-46 GHz).

#### 2.1 Millimeter-wave CMOS Power Generation Challenges

CMOS scaling (Fig. 2.1(a)) has facilitated high-frequency operation of devices, which enables the implementation of circuits and systems targeting mmWave applications. However, scaling comes



Figure 2.1: (a)  $f_{max}$  of deep-submicron CMOS technology nodes from the literature. (b) Supply voltage of deep-submicron CMOS technology nodes. Survey of the (c) saturated output power and (d) PAE achieved by RF and millimeter-wave CMOS PAs prior to the work described in this thesis.

at the cost of a reduction in the breakdown voltage (Fig. 2.1(b)). Furthermore, the lossy silicon substrate in CMOS results in an increase in the loss in active and passive components at high frequencies. Coupled with the low available gain of devices, these factors constitute the bottleneck in designing high performance mmWave, particularly high efficiency PAs with high output power.



Figure 2.2: Conventional power combining techniques: (a) transformer-based series combining, (b) Wilkinson combining and (c) zero-degree combining.

The low breakdown voltage limits the output swing, and consequently the output power that can be delivered to a 50  $\Omega$  load. The load may be transformed to a lower impedance to enable higher output power, but the poor quality of on-chip passive components limits the efficiency of the transformation. The low available gain results in large input power requirements for mmWave PAs, limiting power-added efficiency (PAE= $\frac{P_{out}-P_{in}}{P_{DC}}$ ).

#### 2.2 Large-Scale Millimeter-wave Power Combining Challenges

A possible solution to realize high output power on-chip is to power combine the outputs of several PAs. However, large-scale, low-loss power combining on silicon is fraught with several challenges [14]. Transformer-based series power combining [15] (Fig. 2.2(a)) is limited by the asymmetry that results from parasitic winding and inter-winding capacitances, causing non-constructive addition of individual PA voltages and stability challenges [16]. With Wilkinson power combiners, the transmission-line characteristic impedance  $Z_0$  required for combining n PAs is  $Z_0 = 50\sqrt{n} \Omega$ . Thus, the maximum number of PA units that can be combined in a single Wilkinson is restricted to two to four by the highest  $Z_0$  that can be achieved in the back end of the line (BEOL). Cascading 2:1 Wilkinsons (Fig. 2.2(b)) results in a severe increase in combiner loss. The zero-degree combiner

transformation performed by each stage in the cascade.

[17], [18], [19] shown in Fig. 2.2(c) is essentially a current combining approach where the connecting lines are designed to perform the necessary impedance transformation. This has the advantage of not being restricted to the use of quarter-wavelength transmission lines of a fixed characteristic impedance. However, it is a multi-step structure and its efficiency is a function of the impedance

These design trade-offs can be clearly appreciated from a survey of RF and mmWave PAs reported in literature (Fig. 2.1) prior to the work described in this thesis. Fig. 2.1(a) depicts the scaling of  $f_{max}$  of CMOS technology nodes based on prior reports [20]– [23], while Fig. 2.1(b) depicts the scaling of supply voltage  $V_{DD}$ . Experimentally, it would seem that as technology scales,  $f_{max}V_{DD}^2$  remains approximately constant and equal to 250 GHz-V<sup>2</sup>, although explicitly deriving such a scaling law for constant-field scaling remains challenging due to the complex and layout-dependent nature of loss mechanisms within nanoscale CMOS devices. This observation is, however, consistent with the known fundamental trade-off (referred to as the Johnson limit [24]) between speed of operation and the breakdown voltage of a technology. If one assumes that a PA designer chooses a technology with an  $f_{max}$  that is three times the operating frequency f to ensure approximately 10 dB gain, that the PA is designed to directly drive a 50  $\Omega$  load with no impedance transformation to maintain efficiency, and that the output node sustains a peak-to-peak swing that is twice the  $V_{DD}$  (i.e. no harmonic shaping), the output power of such a PA would be  $\frac{250 \ GHz - V^2}{3f} \times \frac{1}{2 \times 50 \ \Omega} = \frac{830 \ mW * GHz}{f}$ . This scaling law indicates that achieving watt-class output power at frequencies around 1 GHz is feasible, but typical output powers at, say, 60 GHz, would be in the range of 10-15 mW if impedance transformation or power combining are not exploited. Fig. 2.1(c) largely bears out this trend, with some efforts achieving higher output powers through either impedance transformation, power combining or a combination of the two. Fig. 2.1(d) indicates that efficiency also degrades significantly as frequency increases, although an explicit scaling trend for efficiency is more complicated because of the complex nature of loss mechanisms in active and passive devices. Prior state-of-the-art PAEs at mmWave frequencies were thus generally below 20%, except for a few outliers.


Figure 2.3: (a) Concept of series stacking in CMOS PAs with voltage swings annotated and (b) prior art on stacked PAs at low RF frequencies.

# 2.3 Device Stacking in CMOS and SOI CMOS Power Amplifiers

Series stacking of multiple devices (e.g. Fig. 2.3(a)) is a potential technique that breaks these trade-offs associated with CMOS PA design. Stacking of multiple devices increases the voltage swing at the load, as the increased voltage stress can be shared by the various devices in the stack. Thus, for a stack of n devices, the output voltage swing can be n times higher than that of a single device (provided that design techniques are incorporated to ensure that each individual device sees  $V_{gs}$ ,  $V_{gd}$  and  $V_{ds}$  swings that lie within permissible breakdown limits). Stacking, however, does not alleviate the drain-bulk and source-bulk stress of the individual stacked devices. In particular, the top-most device of the stack sees a drain-bulk swing that is equal to the n-times increased output swing of the stacked PA. Consequently, in bulk CMOS, the junction breakdown voltage limits the maximum number of devices that can be stacked to 3 or 4 devices. However, in SOI CMOS, the presence of an isolated floating body for each device eliminates this limitation. The number of devices that can be stacked in SOI CMOS is only limited by the breakdown of the buried oxide (BOX) below each device. This voltage is, however, higher than 10 V in IBM's 45 nm SOI CMOS process [25], enabling five or more devices to be stacked (assuming 2 V peak RF voltage swing per device for long-term reliability [26]).

Prior works on stacked PAs at RF frequencies ((Fig. 2.3(b))) have explored the cases where input power is provided only to the bottom-most device in the stacked configuration [27], [28] as well as to all the devices in the stack through transformer coupling [29]. These works have demonstrated the characteristics of stacking in a variety of technologies such as GaAs MESFET [27], [29] and SOI [28]. More recently, stacking for mmWave PAs has been investigated in nanoscale SOI CMOS [5,25,30,31]. However, while stacking has been predominantly explored in the context of linear/quasi-linear PAs, the contribution of this thesis is investigation of nonlinear switching-type stacked PAs at mmWave frequencies [1,3,14,32,33].

# 2.4 Loss-aware Design Methodology for Class-E Switching PAs

Switching power amplifiers are extensively utilized at RF frequencies owing to their (ideally) lossless operation. The Class-E PA [34] has been of particular interest because of its relatively simple output network. The design of switching PAs in CMOS at mmWave frequencies [25], [5], [32], [35] is challenging due to the lack of ideal square-wave drives (resulting in soft switching), impracticality of harmonic shaping of voltages and currents, low PAE due to the high input drive levels required to switch the devices and high loss levels in the device/switch. Thus, at mmWave frequencies, one can practically implement a "switch-like" PA at best. In this chapter, we explore stacked "Class-E-like" PAs in SOI CMOS and low-power bulk CMOS technologies.

In order to determine if device stacking overcomes the speed-breakdown voltage trade-off of CMOS technology scaling (quantified earlier using the  $f_{max}V_{DD}^2$  product), it is important to determine if stacked Class-E-like PAs are able to increase output power without substantial degradation in efficiency (the PA metric that is significantly impacted by transistor speed). To this end, a loss-

aware Class-E design methodology has been developed [1] that revisits the Class-E design principles in the presence of the increased losses that are seen at millimeter-wave frequencies in stacked PAs in particular. The presence of a finite DC-feed inductance, switch ON-resistance ( $R_{ON}$ ) or passive loss contribute significant complexity to the mathematical analysis of the Class-E PA. However, integrated solutions using real electron devices necessitate that these non-idealities be taken into account so as to avoid sub-optimal designs. The different sources of loss can be accounted for in 2 ways: 1) perturbation analysis, which assumes that losses are small enough so that currents and voltages remain unchanged, or 2) comprehensive circuit analysis with all parameters derived in presence of loss. The availability of thick upper metal layers in modern fabrication technology can be exploited to implement high quality on-chip inductors (Q>15 in [36]). Consequently, perturbation analysis can be used to estimate passive loss. However, at RF and mmWave frequencies a comprehensive analysis for ON-resistance is essential. In [37], the authors incorporate the finite DC-feed inductance into the analysis, but compute all sources of loss perturbatively. Other works



Figure 2.4: Class-E PA with finite DC-feed inductance and non-zero switch on-resistance

have performed a comprehensive analysis with switch ON-resistance *and* finite DC-feed inductance [38,39], but impose one or both of the "Class-E switching conditions", namely Zero Voltage at Switching (ZVS) and Zero Drivative of Voltage at Switching (ZDVS). It must be emphasized that these conditions are essential for high-efficiency operation only when losses are small. In presence of appreciable  $R_{ON}$ , it might be beneficial to sustain some ZVS loss in order to reduce conduction loss. In this work, we analyze the Class-E PA in the presence of a finite DC-feed inductance and finite  $R_{ON}$  without the constraints of either ZVS or ZDVS.

In addition, we present the first attempt to incorporate input power into the optimization procedure. Traditional design approaches [37, 39] have aimed to optimize for drain efficiency ( $\eta = \frac{P_{out}}{P_{DC}}$ ). However, as mentioned before, a more relevant metric for efficiency of switching PAs at high frequencies is the PAE. DC-feed inductance loss has been included in a perturbative fashion. The improved design equations thus provide preliminary design points suitable for further optimization, thereby minimizing tedious load-pull simulations.

#### 2.4.1 Circuit Model and Assumptions

The circuit diagram of the Class-E CMOS PA is shown in Fig. 2.4. In the absence of  $R_{ON}$  and passive loss, the switch voltage and switch current resemble those depicted in the inset in Fig. 2.4 when Class-E switching conditions are satisfied. For the ensuing derivations, we make the following assumptions:

- 1. The active device (MOSFET in this case) can be represented by a switch with finite series ON-resistance  $R_{on}$  in parallel with a linear capacitor  $C_{out}$ .
- 2.  $R_{ON} \ll \frac{1}{\omega_0 C_{out}}$ .
- 3. The loaded quality factor  $(Q_L)$  of the series resonant filter in the output network is large.
- 4. Duty-cycle of the switch is 50%, though the analysis can be extended to any arbitrary dutycycle.
- 5. Filter loss is negligibly small, since filter inductance can be realized using bondwire inductance.

#### 2.4.2 Circuit Analysis

Let us assume that the switch is open ("OFF") for  $0 \le t < \frac{T}{2}$  and closed ("ON") for  $\frac{T}{2} \le t < T$ , where  $T = \frac{2\pi}{\omega_0}$  is the switching period. We use the subscripts "ON" and "OFF" for voltages and currents to indicate the respective half-cycles. Using assumption 3, the load current can be represented as

$$i_{load} = i_0 \cos(\omega_0 t + \phi) . \tag{2.1}$$

During the "ON" half-cycle  $\frac{T}{2} \leq t < T$  , we have the following relations:

$$V_{DD} - V_{S,ON} = L \frac{di_{L,ON}}{dt}$$

$$\tag{2.2}$$

and 
$$V_{S,ON} = (i_{L,ON} - i_0 \cos(\omega_0 t + \phi)) R_{ON}$$
. (2.3)

The current through  $C_{out}$  is neglected in view of assumption 2. Using Eqn. (2.3), we can rewrite Eqn. (2.2) as

$$\frac{dV_{S,ON}}{dt} + \left(\frac{R_{ON}}{L}\right) V_{S,ON} - i_0 \omega_0 R_{ON} \sin(\omega_0 t + \phi) - \left(\frac{V_{DD} R_{ON}}{L}\right) = 0.$$
(2.4)

The solution to this linear differential equation is of the form

$$V_{S,ON}(t) = V_{DD} + a_1 e^{\beta t} + a_2 \cos(\omega_0 t + \phi) + a_3 \sin(\omega_0 t + \phi) , \qquad (2.5)$$

where

$$a_{1} = V_{S,ON}(\frac{T}{2}) - V_{DD} - \frac{R_{ON}i_{0}e^{-\frac{\beta T}{2}} + \frac{\beta}{\omega_{0}}R_{ON}i_{0}e^{-\frac{\beta T}{2}}\sin(\phi)}{\left(1 + \frac{\beta^{2}}{\omega_{0}^{2}}\right)}$$

$$a_{2} = \frac{-R_{ON}i_{0}}{1 + \frac{\beta^{2}}{\omega_{0}^{2}}}, \ a_{3} = \frac{-R_{ON}i_{0}\beta}{\omega_{0}\left(1 + \frac{\beta^{2}}{\omega_{0}^{2}}\right)}, \ \beta = \frac{-R_{ON}}{L},$$
(2.6)

and  $V_{S,ON}(\frac{T}{2})$  is a constant to be evaluated.

For the "OFF" half-cycle  $0 \le t < \frac{T}{2}$ , when the switch is open, we can write equations identical to (2.2) and (2.3) and arrive at

$$\frac{d^2 V_{S,OFF}}{dt^2} + \frac{V_{S,OFF}}{LC_{out}} - \frac{i_0 \omega_0}{C_{out}} \sin(\omega_0 t + \phi) - \frac{V_{DD}}{LC_{out}} = 0 .$$
 (2.7)

The solution to this second order linear differential equation is given by

$$V_{S,OFF}(t) = V_{DD} \left[ 1 - \cos(\omega_s t) + V_{S,OFF}(0) \cos(\omega_s t) \right] \\ + \frac{V'_{S,OFF}(0)}{\omega_s} \sin(\omega_s t) \\ + \frac{i_0 \omega_0 \sin(\phi)}{C_{out} (\omega_s^2 - \omega_0^2)} \left[ \cos(\omega_0 t) - \cos(\omega_s t) \right] \\ + \frac{i_0 \omega_0^2 \cos(\phi)}{C_{out} (\omega_s^2 - \omega_0^2)} \left[ \frac{\sin(\omega_0 t)}{\omega_0} - \frac{\sin(\omega_s t)}{\omega_s} \right] , \qquad (2.8)$$

where  $\omega_s = \frac{1}{\sqrt{LC_{out}}} = n_0 \omega_0$ , while  $V_{S,OFF}(0)$  and  $V'_{S,OFF}(0)$  are constants to be evaluated. The values for  $V_{S,ON}(\frac{T}{2})$ ,  $V_{S,OFF}(0)$  and  $V'_{S,OFF}(0)$  can be arrived at by imposing the following continuity conditions:

$$i_{L,OFF}(0^{+}) = i_{L,ON}(T^{-}), V_{S,OFF}(0^{+}) = V_{S,ON}(T^{-})$$
(2.9)

$$i_{L,OFF}(\frac{T}{2}^+) = i_{L,ON}(\frac{T}{2}^-)$$
 (2.10)

The load impedance  $Z_{load}$  is computed as the ratio of the fundamental component of the switch voltage to that of the load current. Since no constraints have been imposed on either the switch voltage or its derivative at switch turn-on, we need to account for possible capacitive discharge loss. Under the assumption  $R_{ON} \ll \frac{1}{\omega_0 C_{out}}$ , this loss can be estimated as

$$P_{loss,cap} = 0.5 f_0 C_{out} \left[ V_{S,OFF}^2 \left( \frac{T}{2} \right) - V_{S,ON}^2 \left( \frac{T}{2} \right) \right] .$$
 (2.11)

The loss in the switch is given by

$$P_{loss,switch} = R_{ON} * \frac{1}{T} \int_{\frac{T}{2}}^{T} \left(\frac{V_{S,ON}}{R_{ON}}\right)^2 dt .$$

$$(2.12)$$

In order to incorporate input power into the formulation, the input power  $(P_{in})$  is approximated as

$$P_{in} = k f_0 C_{in} V_{on}^2 , (2.13)$$

where  $C_{in} = C_{gs} + C_{gd}$  in the triode region,  $V_{on}$  is the input drive level in the "ON" half-cycle and k is a fitting parameter determined from schematic simulations [40]. Finite reverse isolation (i.e.  $C_{gd} \neq$ 0) causes the value of parameter k to vary with the parameter  $n_0$  (since output network component values change), but for preliminary analysis, this dependence is ignored. Finally, if  $R_{choke} = \frac{\omega_0 L}{Q_{choke}}$  is the series resistance in the DC-feed inductance, its loss can be calculated using perturbation analysis as

$$P_{loss,choke} = R_{choke} * \frac{1}{T} \left( \int_0^{\frac{T}{2}} i_{L,OFF}^2 dt + \int_{\frac{T}{2}}^{T} i_{L,ON}^2 dt \right)$$
(2.14)

The foregoing lead to a complete expression for PAE:

$$PAE = 1 - \frac{P_{loss}}{P_{DC}} - \frac{P_{in}}{P_{DC}}$$
(2.15)

where 
$$P_{loss} = P_{loss,switch} + P_{loss,choke} + P_{loss,cap}$$
 (2.16)

and 
$$P_{DC} = V_{DD} \times I_{DC}$$
 (2.17)

$$= V_{DD} \times \frac{1}{T} \left( \int_0^{\frac{T}{2}} i_{L_s,OFF} dt + \int_{\frac{T}{2}}^{T} i_{L_s,ON} dt \right)$$
(2.18)

#### 2.4.3 Optimization Procedure and Comparison to Prior Works

The circuit may now be optimized for PAE by choosing the appropriate load impedance. This is achieved by means of a MATLAB code which sweeps the magnitude  $i_0$  and phase  $\phi$  of the load current to arrive at a design point with optimal PAE for a given device size, input drive level  $V_{on}$ and the parameter  $n_0$ . A global optimization is performed subsequently by varying  $V_{on}$  and  $n_0$  to select the design point with highest PAE for a fixed device size. If the load impedance is different from 50 $\Omega$ , then a matching network needs to be designed to perform impedance transformation. In prior works, the loss associated with this matching network has not been considered. In this work, subsequent to PAE optimization, the device size (and all other circuit components) are scaled so that  $R_{load} = 50\Omega$  to determine the power that can be delivered to a 50 $\Omega$  load and to eliminate the loss in a matching network.

While this chapter and this thesis in general focuses on millimeter-wave PAs and transmitters, in order to validate the developed theory, we use the  $0.7\mu$ m channel-length thick-oxide devices in IBM's  $0.18\mu$ m CMOS technology. The maximum instantaneous voltage swing in a cycle across any two device terminals is typically limited to twice the recommended  $V_{DD}$  for long-term reliability in PAs [26]. The recommended  $V_{DD}$  for these devices is 5V, making them suitable for moderate and high-power applications.  $R_{ON}(\Omega) = 5200/(W \times (V_{on} - V_{th}))$ , where W is the device width



Figure 2.5: Contour plots for (a) output power, (b) drain efficiency and (c) PAE as functions of  $V_{on}$ and  $n_0$  for Class-E PAs based on the loss-aware design methodology using  $0.7\mu m$  channel-length thick-oxide devices in IBM's  $0.18\mu$ m CMOS technology. (d) Comparison of drain voltage waveforms for a PAE-optimal 5GHz device-based design vs. theoretical and switch-based simulations.

in microns and  $V_{th}=0.5$ V,  $C_{out}/W=0.9$ fF/ $\mu$ m and  $C_{in}/W=2$ fF/ $\mu$ m. Optimal PAE designs for 5GHz Class E PAs are determined from the MATLAB code as functions of  $V_{on}$  and  $n_0$ . Transient simulations are also performed in Cadence using a switch-based model with appropriate  $R_{ON}$  and  $C_{out}$  for the various points in Fig. 2.5(a),(b) and (c). These simulations show excellent agreement with theoretical results due to the comprehensive analysis. From the contour plots, it is evident that for a fixed  $n_0$ , there exists an optimum value for  $V_{on}$  which maximizes PAE, since an increase in  $V_{on}$  is accompanied by an increase in input power. For a fixed  $V_{on}$ , PAE reaches a maximum for a certain optimum value of  $n_0$ . This is in contrast to the analysis of [37], where  $\eta$  increases uniformly with  $n_0$  when only device loss is present. This is a consequence of incorporating passive loss as well as optimization of PAE, since output power reduces significantly for high values of  $n_0$ .

 Table 2.1: Comparison of the generalized loss-aware Class-E analysis with prior techniques for

 5GHz Class E PAs

|                        |           | $R_{on}$ ignored | $B \pm ZVS$ | $R_{on} + \text{ZVS}$ |  |  |
|------------------------|-----------|------------------|-------------|-----------------------|--|--|
| Technique              | This work | +ZVS+ZDVS        |             | +ZDVS                 |  |  |
|                        |           | [37]             | [39]        | [41]                  |  |  |
| n                      | 1.4       | 1.6              | 1.4         | 1.4                   |  |  |
| $C_{out}(\mathrm{fF})$ | 631       | 373              | 560         | 437                   |  |  |
| $V_{DD}(\mathbf{V})$   | 3.34      | 3.09             | 3.1         | 3.09                  |  |  |
| $P_{dc}(\mathrm{mW})$  | 217       | 157              | 231         | 218                   |  |  |
| $P_{out}(\mathrm{mW})$ | 179       | 125              | 177         | 161                   |  |  |
| $P_{in}(\mathrm{mW})$  | 15        | 9                | 14          | 11                    |  |  |
| $\eta(\%)$             | 83        | 79               | 77          | 74                    |  |  |
| PAE(%)                 | 75        | 74               | 71          | 69                    |  |  |

Table 2.1 compares the theoretical PAE-optimal design point from the presented methodology with those resulting from prior art, all designs being scaled to drive a load with  $R_{load} = 50\Omega$ . Evidently, our approach results in the highest efficiency numbers, and while the work in [37] results in a PAE that is similar, the output power is much higher in our approach. The design procedure is used to find a PAE-optimal design point at 5GHz for the 0.7 $\mu$ m thick-oxide device. Passive losses are included in the design procedure with a  $Q_{choke}$  of 15. The output series filter inductance is assumed to be obtained through the output bondwire. Integrated capacitors at 5GHz typically have negligible loss levels. This PAE-optimal design point is used as the starting point for a realistic design that utilizes PDK device models and is simulated in Spectre RF. Practical design issues, such as non-ideal switching characteristics and finite reverse isolation of the device, require minor modifications to the circuit parameters. A comparison between the drain waveforms of the theoretical PAE-optimal design point, simulations based on an ideal switch-based model and the PDK-model-based design is summarized in Fig. 2.5(d). A very close agreement is seen, proving the usefulness of the methodology as a starting point for realistic yet optimal designs.

# 2.5 Millimeter-wave Stacked CMOS Class-E-like PAs



Figure 2.6: (a) Stacked CMOS Class-E-like PA concept with voltage swings annotated in volts and (b) simplification of the stacked topology for analysis using loss-aware Class-E design methodology.

# 2.5.1 Concepts

Fig. 2.6(a) depicts the concept of a mmWave stacked CMOS Class-E-like PA. The stacked configuration consists of multiple series devices, which might be of equal or different size. In order to preserve input power and improve PAE at mmWave, only the bottom device is driven by the input signal. The devices higher up in the stack turn on and off due to the swing of the intermediary

nodes. The topmost drain is loaded with an output network that is designed based on Class-E principles (referred to as a "Class-E load network" <sup>1</sup>), and consequently sustains a Class-E-like voltage waveform. The intermediary drain nodes must also sustain Class-E-like voltage swings with appropriately scaled amplitudes so that the voltage stress is shared equally among all devices. In the 45 nm SOI and 65 nm CMOS technologies employed, the nominal  $V_{DD}$  of the high-speed thin-oxide devices is  $\approx 1$  V and for long-term reliability, the maximum swing across any two transistor junctions under large signal operation is limited to  $2 \times V_{DD} = 2$  V [26]. Consequently, for a PA with n stacked devices, the peak output swing is 2n V as marked on Fig. 2.6(a), and the appropriate intermediary node swings are also noted. Appropriate voltage swing may be induced at the intermediary nodes through techniques such as inductive tuning [26], capacitive charging acceleration [42] and placement of Class-E load networks at intermediary nodes [32] as detailed later in this chapter. For simplicity, the circuitry used to induce appropriate voltage swing at the intermediary nodes is not shown in Fig. 2.6(a). In order to conform to the peak ac swing limit across the gate-source junction in the on half-cycle and the gate-drain junction in the off half-cycle, the gates of the devices in the stack must swing as shown in Fig. 2.6(a). The swing at each gate is induced through capacitive coupling from the corresponding source and drain node via  $C_{qs}$  and  $C_{gd}$  respectively and is controlled through the gate capacitor  $C_n$ . The dc biases of all gates are applied through large resistors.

From Fig. 2.6(a), it can be seen that for a 2-stacked switching PA, the gate of the top device is connected to ground via a large capacitor and experiences no signal swing. However, this does not reduce a 2-stacked switching PA to a regular cascode configuration because of the following reasons. The main objective of stacking is to allow operation off a higher supply voltage by distributing the overall voltage stress equally amongst the transistors. This is accomplished by engineering the drain and gate nodes voltage profiles (as depicted in Fig. 2.6(a)) to ensure that all devices have the same  $V_{gs}$ ,  $V_{gd}$  and  $V_{ds}$  swings, which results in a linear increase in the supply voltage with the number of devices stacked. In stacked switching PAs, the nature of the voltage swings requires a constant gate bias only for the second device. Conventional cascode PAs can operate off a higher supply voltage as well but a linear scaling in supply voltage cannot be achieved. This is because the gate

<sup>&</sup>lt;sup>1</sup>A "Class-E load network" consists of a DC-feed inductor to the power supply in parallel with a series resonant filter connected to the appropriate Class-E load impedance.

of the top device is usually connected to the supply voltage to maximize small-signal power gain. The ensuing unequal voltage stress across the devices compromises long-term reliability and can even enforce operation off a single-device supply voltage [43], [44], [45]. The simulated drain-source and gate-source waveforms for the 2-stacked and 4-stacked Class-E-like PAs (Fig. 2.15 (a) and (b) respectively, presented in Section 2.5.7) demonstrate that voltage swings are indeed equal for the prototypes implemented in this work and serve to distinguish a 2-stacked switch-like PA from a conventional cascode PA. This claim is further validated by comparing the measured performance of the 2-stacked PA implemented in this work with a prior mmWave cascode PA [45] in Section 2.5.8.2.

#### 2.5.2 Theoretical Analysis and Fundamental Limits

To facilitate a theoretical analysis, the improved loss-aware Class-E design methodology described Section 2.4 and in [1] is employed. The devices (taken to be equal in size) in a stacked switching PA are assumed to behave as a single switch with linearly increased breakdown voltage and ONresistance (Fig. 2.6(b)). As far as theoretical results are concerned, only the total ON-resistance of the stacked configuration and output capacitance of the top device are pertinent. The output capacitance of a stacked configuration should ideally scale down linearly with number of devices stacked. However, wiring parasitics are significant at mmWave frequencies and there will be parasitic capacitance to ground from the intermediate drain/source and gate nodes which will prevent linear scaling of output capacitance with stacking. As a worst-case estimate, the overall output capacitance is taken to be the same as that of a single-device  $(=C_{gd} + C_{ds})$ , where  $C_{ds} = \frac{C_{db} \times C_{sb}}{C_{db} + C_{sb}}$ since the high body resistance in SOI technology causes  $C_{db}$  and  $C_{sb}$  to appear in series). It is indeed this mechanism that prevents efficiency from remaining constant as we stack more devices. as will be shown later in this chapter (graphically in Fig. 2.7(a) and theoretically in Eqn. 2.27). The devices can be of different sizes and there could potentially be some benefit in tapering device sizes as well [5], [4], since the gate capacitors conduct a portion of the device current. Thus, progressive device size reduction up the stack would reduce parasitic capacitances and prevent capacitive discharge loss at intermediate nodes. However, device size tapering has not been pursued in this work.

The time-domain equations and corresponding design procedure for a stacked Class-E PA can

|                 |                                         |                                                                             | •                                                             |                                                                                            |                                                                                      |
|-----------------|-----------------------------------------|-----------------------------------------------------------------------------|---------------------------------------------------------------|--------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|
|                 | $\overline{\mathbf{R_{ON}}} =$          | $\overline{\mathbf{C}_{\mathbf{out}}}$                                      | $\overline{\mathbf{R_{ON}}}\times\overline{\mathbf{C_{out}}}$ | $\overline{\mathrm{C_{in}}}$                                                               |                                                                                      |
| Tech.           | $\mathbf{R_{ON}} \times \mathbf{W}^{*}$ | $= \overline{\mathbf{C_{gd}}} + \overline{\mathbf{C_{ds}}}$                 | (Switch Time                                                  | $=\overline{\mathbf{C_{gs}}}+\overline{\mathbf{C_{gd}}}$                                   | $\overline{\mathbf{P_{in}}} = rac{\mathbf{P_{in}}}{\mathbf{W}}$                     |
|                 |                                         | $=rac{\mathbf{C_{gd}}}{\mathbf{W}}+rac{\mathbf{C_{ds}}}{\mathbf{W}}^{\#}$ | Constant)                                                     | $=rac{\mathbf{C}_{\mathbf{gs}}}{\mathbf{W}}+rac{\mathbf{C}_{\mathbf{gd}}}{\mathbf{W}}^+$ |                                                                                      |
| 45 nm SOI       | 275 $\Omega - \mu m$                    | $0.769~{ m fF}/\mu m$                                                       | 211.5 $\Omega$ –fF                                            | $0.31~{\rm fF}/\mu m$                                                                      | $4.191 \times f_0 \times \overline{C_{in}} \times (V_{high} - V_{low})^2$            |
| CMOS            | $(V_{gs}=1V)$                           |                                                                             |                                                               |                                                                                            |                                                                                      |
| 65 nm low-power | 820 $\Omega - \mu m$                    | $0.48~{\rm fF}/\mu m$                                                       | 393.6 $\Omega$ -fF                                            | $0.28~{\rm fF}/\mu m$                                                                      | $4.341 \times f_0 \times \overline{C_{in}} \times \left(V_{high} - V_{low}\right)^2$ |
| bulk CMOS       | $(V_{gs}=1V)$                           |                                                                             |                                                               |                                                                                            |                                                                                      |

Table 2.2: Normalized device parameters used in loss-aware Class-E analysis

Note:  $R_{on}$ ,  $C_{gs}$ ,  $C_{gd}$ ,  $C_{ds} = \frac{C_{db} \times C_{sb}}{C_{db} + C_{sb}}$ ,  $P_{in}$  correspond to ON-resistance, gate-source, gate-drain and drain-source capacitances and input power (to switch a device between the triode and cut-off regions) respectively for a device of width W (including layout parasitics).  $f_0$  is the operating frequency.  $V_{high}$  and  $V_{low}$  refer respectively to the high and low amplitude levels of a 45 GHz square-wave input signal.

<sup>\*</sup> Estimated in triode region of operation.

<sup>#</sup> Estimated in cut-off region of operation.

<sup>+</sup> Estimated as average of capacitance values in cut-off and triode regions of operation.

be obtained from those derived in Section 2.4 by simply replacing  $R_{ON}$  and  $V_{DD}$  with  $n \times R_{ON}$ and  $n \times V_{DD}$  respectively. The reader is directed to [3] for the resulting (detailed) equations. For various levels of stacking (n), the design methodology is used to analytically vary device-size and dc-feed inductance to find the design point(s) with optimal PAE under the constraint of a 50  $\Omega$  load impedance to avoid impedance transformation losses. Device ON-resistance, output capacitance and input-drive-power as functions of device size are determined from post-layout device simulations and are validated through device measurements (discussed in Section 2.5.6). The values for these parameters are shown in Table 2.2, where  $f_0$  is the operating frequency,  $\overline{R_{ON}} = R_{ON} \times W$ ,  $\overline{C_{out}} = \frac{C_{out}}{W}$ ,  $\overline{C_{in}} = \frac{C_{in}}{W}$ , and  $\overline{P_{in}} = \frac{P_{in}}{W}$  are technology parameters normalized to the device width  $(R_{ON}, C_{out}, C_{in} \text{ and } P_{in} \text{ being respectively the ON-resistance, output capacitance,}$ input capacitance and input power corresponding to a device of width W). At mmWave frequencies. the constants of proportionality in the input power functions take into account the power lost in the gate resistance, and are consequently frequency dependent. The values of those constants reported in Table 2.2 are based on 45 GHz simulations. In order to incorporate the loss of the dc-feed inductance, a quality factor of 15 is assumed at 45 GHz based on measurements [35]. As an example, for a stack of four devices (n=4), we start with an initial device size of 100  $\mu$ m and



Figure 2.7: (a) Theoretical and simulated (post-layout) output power and PAE and (b) device size and theoretical device stress for the optimal design as a function of number of devices stacked based on the loss-aware Class-E design methodology at 45 GHz in 45 nm SOI CMOS. Loss in dc-feed inductance is included for theoretical results. Output power and PAE for a switch+capacitor based model for the 4-stacked configuration are also annotated.

set the tuning parameter  $\omega_s = 0.8 \times \omega_0$ . The design methodology then determines the optimal load impedance for highest PAE and the corresponding output power. The load impedance is then scaled (along with device size, input and output powers) to have a real part of 50  $\Omega$ . The procedure is repeated by changing the tuning parameter  $\omega_s$ . Finally, amongst all these design points for a stack of four devices driving a 50  $\Omega$  load, the one with the highest PAE is chosen. This yields a device size of 204  $\mu$ m for the 4-stacked PA, with theoretical output power and PAE of 145 mW and 48% respectively (as shown in Fig. 2.7). The procedure can similarly be used to determine the corresponding metrics for other levels of stacking.

Fig. 2.7(a) depicts the optimal output power and PAE for different levels of stacking in 45 nm SOI CMOS at 45 GHz. The optimal size of each stacked device and the associated device stress (defined as the ratio of the average current drawn from the power supply to the device width) are shown in Fig. 2.7(b). It is clear that due to the increasing achievable output voltage swing, stacking in Class-E-like CMOS PAs enables dramatic increases in output power (near-quadratic due to linear increase in output swing). The PAE reduces with increased stacking due to increasing total switch loss. However the methodology ensures that the PAE degradation is gradual. In order

to do this, the design methodology requires the size of each stacked device to increase with n to reduce the individual (and hence overall) ON-resistance. Consequently, careful device layout is required for high levels of stacking as it is challenging to layout large devices while maintaining a high  $f_{max}$ . Another important consideration for device stacking is the current stress for the stacked

high  $f_{max}$ . Another important consideration for device stacking is the current stress for the stacked devices, which increases with the level of stacking. Current stress (or large signal current-density) is the ratio of the average current drawn from the supply under large signal operation  $(I_{DC})$  to the device width. Note that  $I_{DC}$  is different from the bias current  $I_{bias}$  drawn with no input power (i.e. under small signal operation), the latter being used to determine the current density for operating at highest  $f_{max}$  in linear PAs. The current drawn under large signal operation is typically 1.5-2 times higher than the small signal bias current in our implementations. This implies that in a practical implementation, the metallization of the source and drain fingers of the MOS devices must be augmented with additional metal layers, if required, so that they can support the required currents while satisfying electromigration rules for the technology. While Fig. 2.7(a) shows an increasing trend for output power till 5 devices, at much higher levels of stacking, the assumption in the theoretical analysis that the switch ON resistance is much smaller than the impedance of its output capacitance [1] would be violated. Furthermore, there would be diminishing returns in output power owing to increased losses with stacking. In practice, the maximum practical device size, the maximum current stress that can be tolerated as per electromigration requirements and drain-bulk/buried-oxide breakdown mechanisms would determine the maximum number of devices that can be stacked. The post-layout simulated results for output power and PAE for the 2stacked and the 4-stacked Class-E-like PA prototypes implemented in this work (and described in Section 2.5.7) have been annotated on Fig. 2.7(a) as well and show excellent agreement with the theoretical output power. The post-layout simulated efficiency is lower by  $\approx 20\%$  owing to various implementation losses and soft-switching at mmWave as well as power-loss at intermediate nodes which are not accounted for in theory. However, the theoretical and simulated trends in PAE are in agreement. Later in this chapter, a switch+capacitor based model for the device is constructed for simulation-based investigation of power loss at intermediate nodes for a 4-stacked configuration. The resulting output power and PAE (Fig. 2.7(a)) show excellent agreement with post-layout device-based simulations and re-affirm the utility of a simplified theoretical analysis.

# 2.5.3 Interpretation using Waveform Figures of Merit

An analysis using the unique properties of switching PAs facilitates a better understanding of the underlying phenomena associated with device stacking and an interpretation of the results of the loss-aware Class-E design methodology. An excellent description of the characteristics of switching PAs can be found in [40]. We have

$$PAE = 1 - \frac{P_{loss}}{P_{DC}} - \frac{P_{in}}{P_{DC}}$$
(2.19)

where  $P_{in}$  and  $P_{DC}$  are input power to the PA and the dc power consumption respectively. The loss in the PA  $(P_{loss})$  is given by

$$P_{loss} = P_{loss,switch} + P_{loss,cap} \tag{2.20}$$

$$= I_{RMS,n}^2 \times n \times R_{ON,n} + P_{loss,cap}$$
(2.21)

$$= I_{RMS,n}^2 \times n \times \frac{R_{ON,n}}{W_n} + P_{loss,cap}$$
(2.22)

where n is the number of devices stacked in series,  $R_{ON,n}$  is the ON-resistance of each device in the n-stacked PA,  $W_n$  is the width of each device/switch and  $I_{RMS,n}$  is the RMS value of the current flowing through the stack of n switching devices (excluding the output capacitance).  $P_{loss,cap}$ is the switching loss associated with the output capacitance of the PA and is dependent on the topmost drain voltage value at the switching instant. In general, at mmWave frequencies, the capacitive discharge loss is negligible compared to the conduction loss in the switching device(s). This is evident from Table 2.3, where the conduction loss in the switch and the capacitive discharge

Table 2.3: Conduction loss and capacitive discharge loss for the optimal designs at different levels of stacking described in Fig. 2.7. Values for the waveform figures of merit  $F_I^2$  and  $F_C$  for loss-aware and ZVS based designs are also tabulated.

| Devices | Device             | ON                  | Output             | Device Conduction            | Output Cap.               | $F_I^2$      | $F_C$        | $F_I^2$ | $F_C$ |
|---------|--------------------|---------------------|--------------------|------------------------------|---------------------------|--------------|--------------|---------|-------|
| Stacked | size               | Resist.             | Cap. $(C_{out,n})$ | Loss (mW)                    | Switching Loss (mW)       | (loss-aware) | (loss-aware) | (ZVS)   | (ZVS) |
| (n)     | $(\mu \mathbf{m})$ | $(\mathbf{\Omega})$ | $(\mathbf{fF})$    | $(\mathbf{P_{loss,switch}})$ | $(\mathbf{P_{loss,cap}})$ |              |              |         |       |
| 1       | 60                 | 4.58                | 35.46              | 1.5                          | 0                         | 2.06         | 2.06         | 2.36    | 3.61  |
| 2       | 114                | 2.41                | 67.37              | 10.6                         | 1.3                       | 2.56         | 0.94         | 2.17    | 3.02  |
| 3       | 168                | 1.64                | 99.29              | 32.8                         | 5.7                       | 2.73         | 0.74         | 2.1     | 2.61  |
| 4       | 204                | 1.35                | 120.56             | 80.2                         | 17.8                      | 2.62         | 0.68         | 2.07    | 2.29  |
| 5       | 228                | 1.21                | 134.75             | 176.3                        | 45.4                      | 2.54         | 0.63         | 2.05    | 2.04  |

loss have been tabulated for the optimal designs at different levels of stacking described in Fig. 2.7. Indeed, this reinforces our earlier assertion that the conventional ZVS/ZdVS-based Class-E design methodology is not applicable at mmWave frequencies. Therefore, for the purpose of simplifying our analysis, we shall ignore the contribution of the term  $P_{loss,cap}$  to the overall loss. In a switching PA, the average current drawn from the supply is always proportional to  $I_{RMS,n}$ . The proportionality constant depends on the tuning of the load network [40]. Tuning of a Class-E load network is determined by the the dc-feed inductance ( $L_s$ ) and the load impedance ( $Z_{load}$ ) in relation to the device output capacitance. Since we are in a regime where conduction loss is significant, the proportionality constant will also depend on the value of the total switch ON-resistance ( $n \times R_{ON,n}$ ) relative to the output capacitance  $C_{out,n}$ . Since  $R_{ON,n} \times C_{out,n}$  is a technology constant, specifying n,  $C_{out,n}$ ,  $L_s$  and  $Z_{load}$  completely characterizes the tuning of the stacked Class-E PA. It is also therefore clear that the optimal tuning is likely to vary for different levels of stacking due to the increasing total switch loss. Ignoring capacitive discharge loss, Eqn. (2.22) becomes

$$P_{loss} \approx I_{RMS,n}^2 \times n \times \frac{\overline{R_{ON,n}}}{W_n}$$

$$= \frac{I_{RMS,n}^2}{I_{DC,n}^2} \times I_{DC,n}^2 \times n \times \frac{\overline{R_{ON,n}}}{W_n}$$

$$= F_{I,n}^2 \times I_{DC,n}^2 \times n \times \frac{\overline{R_{ON,n}}}{W_n}$$
(2.23)

where  $F_{I,n} = \frac{I_{RMS}}{I_{DC}}$  is a waveform figure of merit defined in [40] and  $I_{DC,n}$  is the average supply current with *n* devices stacked.

For a stack of n devices, the supply voltage scales linearly with n. On the other hand, in a switching PA, average supply current is proportional to the product of output capacitance, supply voltage and the operating frequency ( $\omega_0$ ), the constant of proportionality being dependent on the tuning of the circuit. The linear dependence on output capacitance is simply an artifact of circuit scaling properties, while the linear scaling with supply voltage arises from the fact that switching PAs are linear with respect to excitations at the drain node (e.g. supply voltage) [40]. Denoting the impedance of the device output capacitance at the fundamental frequency by

$$Z_C = \frac{1}{\omega_0 \times \left(\overline{C_{out}} \times W_n\right)}$$

the waveform figure of merit  $F_C$  is defined in [40] as

$$F_C = \frac{P_{DC}}{\frac{V_{DC}^2}{Z_C}} \tag{2.24}$$

For an n-stacked PA,  $V_{DC} = (n \times V_{DD})$  where  $V_{DD}$  is the supply voltage for a single device PA. Substituting

$$P_{DC} = V_{DC} \times I_{DC,n} = (n \times V_{DD}) \times I_{DC,n}$$

in Eqn. 2.24, we get

$$I_{DC,n} = F_{C,n} \times \omega_0 \times \left(\overline{C_{out}} \times W_n\right) \times (n \times V_{DD})$$
(2.25)

Consequently,

$$PAE = 1 - \frac{F_{I,n}^2 \times I_{DC,n}^2 \times n \times \frac{R_{ON,n}}{W_n}}{(n \times V_{DD}) \times I_{DC,n}} - \frac{W_n \times \overline{P_{in}}}{(n \times V_{DD}) \times I_{DC,n}}$$
(2.26)

$$= 1 - n \times F_{I,n}^{2} \times F_{C,n} \times \omega_{0} \times C_{out} \times R_{ON}$$
$$- \frac{k \times \overline{C_{in}} \times (V_{high} - V_{low})^{2}}{n^{2} \times V_{DD}^{2} \times F_{C,n} \times \overline{C_{out}}}$$
(2.27)

where k is the technology- and frequency-dependent constant of proportionality that results from the input power functions as discussed in Section 2.4 and shown in Table 2.2.

Table 2.3 lists the values of the waveform figures of merit for designs based on the loss-aware and ZVS methodologies for different levels of stacking. While the waveform metric  $F_I$  is comparable for both, the loss-aware methodology shapes the waveforms to minimize  $F_C$ , thereby yielding optimal designs with highest possible PAE.

The foregoing expression captures the variation in PAE in terms of technology constants and number of devices stacked. The only design-related variables in this expression are the waveform figures of merit. It is well known that  $\overline{R_{ON}} \times \overline{C_{out}}$  is the technology constant (which we shall refer to as the Switch Time Constant) that determines the drain efficiency of a Class-E PA [40] for a given operating frequency and this constant degrades linearly for an n-stacked device, since the ONresistances add in series while the output capacitance remains that of a single device. However, the loss-aware Class-E design methodology optimizes the output network tuning to ensure that the PAE degradation is gradual as stacking is increased by minimizing the  $F_{I,n}^2 \times F_{C,n}$  product. The benefits of the loss-aware Class-E tuning methodology over the ZVS/ZdVS-based tuning methodology can be appreciated in Fig. 2.8, where the  $F_{I,n}^2 \times F_{C,n}$  product for the loss-aware design technique can be observed to be lower than that corresponding to the ZVS design methodology by a factor of 2-3, depending on the number of devices stacked.



Figure 2.8: Product of waveform figures of merit  $F_{I,n}^2$  and  $F_{C,n}$  for stacked Class-E-like PAs in 45 nm SOI CMOS at 45 GHz based on the loss-aware and ZVS based design methodologies.

The preceding analysis also highlights the importance of Switch Time Constant as a technology metric that determines the efficiency of switching PAs. For linear-type PAs,  $f_{max}$  is a sufficient metric to gauge the PA efficiency. For switching-type PAs,  $f_{max}$  determines the input power requirements (via the technology- and frequency-dependent constant k) while Switch Time Constant determines the drain efficiency.

As the levels of stacking are increased, Switch Time Constant becomes more significant than  $f_{max}$ . As can be seen in Table 2.2, and later in this chapter, 65 nm low-power bulk CMOS and 45 nm SOI CMOS have similar  $f_{max}$  but significantly different Switch Time Constants. Consequently, it can be expected that switching-type PAs in 45 nm SOI CMOS will achieve higher efficiencies than those in 65 nm low-power bulk CMOS. This is validated by our experimental results in Section 2.5.8.

The loss-aware Class-E analysis takes into account several mmWave non-idealities, and therefore, the results in Fig. 2.7 represent the fundamental limits on achievable performance in stacked CMOS Class-E-like PAs. The main non-ideality that causes deviation from these limits in practice is soft switching of the stacked devices due to the lack of square-wave drives. Nevertheless, the optimal design points predicted by the analysis are excellent starting points for simulation-based optimization.

## 2.5.4 Stacking vs. Power Combining



Figure 2.9: Comparison of device stacking in Class-E-like PAs (based on loss-aware Class-E design methodology) at 45 GHz in 45 nm SOI CMOS with (a) 2, 4 and 8-way Wilkinson-tree-based power-combining and transformer-based series power-combining (the 2-stacked and 1-stacked Class-E-like PAs obtained from the loss-aware Class-E design methodology are used with both the N-way Wilkinson-tree-based and transformer power-combiners) and (b) impedance transformation at 45 GHz in 45 nm SOI CMOS (the 2-stacked and 1-stacked Class-E-like PAs obtained from the loss-aware Class-E design methodology are scaled to increase output power and a 2-element L-C network is used to transform the 50  $\Omega$  load to the optimal load impedance for the scaled PAs. Quality factors for the inductor and capacitor are assumed to be 15 and 10 respectively at 45 GHz).

In order to appreciate the benefits of device stacking using the loss-aware Class-E design methodology, it is imperative to contrast this approach to conventional impedance transformation and power combining techniques. To evaluate the performance of power combining, a 45 nm SOI 2-stacked Class-E-like PA (resulting from the loss-aware design methodology) with a theoretical

output power of 34 mW and a corresponding theoretical PAE of 54% at 45 GHz is chosen, since it has reasonable output power as well as the highest efficiency (Fig. 2.7(a)). Since the 2-stacked PA is designed for an optimal load impedance of 50  $\Omega$ , a cascaded-tree of 2-way 50  $\Omega$  Wilkinson powercombiners is chosen. A 2-way 50  $\Omega$  Wilkinson power-combiner with 70.7  $\Omega \frac{\lambda}{4}$  transmission lines in 45 nm SOI CMOS technology has an EM-simulated efficiency  $\eta = 0.87$  at 45 GHz. An N-way cascaded-Wilkinson-tree power-combiner (where N is an even multiple of two) will therefore have an overall efficiency of  $\eta^{\log_2 N}$ . Fig. 2.9(a) compares the theoretical PAE, as a function of output power, for different levels of device stacking with that of 2, 4 and 8-way Wilkinson-tree power-combining. For a given output power, stacked Class-E-like PAs implemented using the loss-aware Class E design methodology offer  $\approx 10-20\%$  higher efficiency compared to Wilkinson power-combining (using 2stacked PAs). Power-combining using transformers is a better alternative at mmWave frequencies. since ideally transformer-based series power-combining has a constant efficiency with number of elements combined. However, interwinding and self-resonant capacitances introduce asymmetry in transformer power-combiners, degrade efficiency and cause stability problems, usually permitting a maximum of two transformer sections to be combined in series [16]. Ignoring the effect of parasitic capacitances, a 2-section series transformer-combiner is used to power-combine two 2-stacked PAs. The secondary inductance is chosen for maximum efficiency subject to a 50  $\Omega$  load, and the PAs are appropriately scaled to drive the load impedance presented by the primary of the transformer. As shown in Fig. 2.9(a), transformer power-combining utilizing 2-stacked PAs can yield results similar to stacking only under ideal conditions and is fundamentally limited to two-way combining. The corresponding results for Wilkinson and transformer-based power-combining using 1-stacked (single device) Class-E-like PAs obtained from the loss-aware design methodology are also included to emphasize the inefficacy of the traditional design technique of using single-device PAs for high-power amplification.

# 2.5.5 Stacking vs. Impedance Transformation

The efficiency of the alternative technique of impedance transformation is dependent on the steepness of transformation as well as the topology of the impedance transformation network. The 2-stacked and 1-stacked Class-E-like PAs in 45 nm SOI obtained from the loss-aware Class-E design methodology at 45 GHz are again employed for the purpose of comparison. In order to achieve output power comparable to those obtained from device stacking, the Class-E-like PAs are scaled appropriately while an impedance transformation network is used to transform the 50  $\Omega$  load to the corresponding lower load impedance for the scaled PAs. A 2-element L-C impedance transformation network is designed and used in each case. The quality factors of the inductor and capacitor are assumed to be 15 and 10 respectively at 45 GHz, based on measured characterizations of inductors, capacitors and transmission lines in the 45 nm SOI CMOS technology [35]. A comparison of the PAEs of impedance transformation and device stacking is summarized in Fig. 2.9(b). Device stacking results in designs with  $\approx$ 10-30% higher efficiency for the same output power compared to impedance transformation.

Once device stacking is exploited to the limit as dictated by secondary breakdown mechanisms (e.g. that of the buried oxide in SOI), it is interesting to consider the combination of device stacking with impedance transformation and/or power combining to achieve watt-class output power levels at mmWave frequencies.

#### 2.5.6 IBM 45 nm SOI and 65 nm CMOS Power Device Modeling

In the 45 nm SOI CMOS, an accurate high frequency model for the device which accounts for Intrinsic Input Resistance (IIR) as well as layout-related wiring resistances, capacitances and inductances of the gate, drain and source fingers and vias is non-existent. The model provided in the design kit is augmented to incorporate the impact of IIR, which models the distributed characteristics of the channel in a MOSFET [46]. While IIR is controlled by two parameters (XRCRG1 and XRCRG2) in BSIMSOI modeling [47] (which default to 0 in the PDK), we have found that transient simulations in SPECTRE fail to converge when these parameters are assigned values based on our device measurements. Consequently, a bias-independent resistance is added in series with the gate to account for IIR. The bias independence of this resistor, along with its location (outside the PDK device model), is a source of inaccuracy in our transient simulations. Wiring resistances and capacitances are extracted using Calibre PEX. High-frequency models for the gate and drain vias are simulated in the IE3D field solver [48].

The layout of a fabricated floating-body power device test structure in 45 nm SOI technology employs a continuous array of gate fingers (40-70) with a finger width of 2.793  $\mu$ m. We make use of a doubly-contacted gate with a symmetric gate via on both sides to reduce gate resistance [46].



Figure 2.10: Close-up of (a) devices and (b) devices with connections to gate capacitors in 4-stacked PA layout implemented in 45 nm SOI CMOS.

Wiring resistances and capacitances are extracted for the entire stacked-device layout configuration and high frequency models for the vias are added to this overall R-C extracted model. The layout for the 4-stacked configuration is shown in Fig. 2.10, while the corresponding high frequency model with parasitics is illustrated in Fig. 2.11. The source and drain fingers of the devices consist of metal layers  $M_1 - M_3$  strapped so as to conform to electromigration requirements. The connection from the source of the bottom device to the ground node supports a large current under large signal operation. Consequently, this connection is augmented with thick metal strips in metal layers  $M_2$ and  $M_3$  strapped together, which also helps minimize the source inductance.

The measured  $f_{max}$  and  $f_T$  of the test structures of the individual power devices used in the implemented 45 nm SOI 2-stacked and 4-stacked PAs (distinct from the custom stacked layout as discussed earlier) are shown in Figs. 2.12(a) and (b) respectively. Device measurements were conducted up to 65 GHz using a pair of coaxial 1.85 mm (dc-65 GHz) ground-signal-ground (GSG) probes, calibrated at the probe tip planes. The industry-standard Open-Short de-embedding was



Figure 2.11: Augmented schematic of 4-stacked power device in 45 nm SOI CMOS with capacitive and inductive layout parasitics.

performed to a reference plane at the top of the gate and drain vias. The measured  $f_{max}$  and  $f_T$  were obtained by extrapolating the measured Mason's Unilateral Power Gain (U) and  $h_{21}$  at 20 dB/decade. The measured U is observed to have 20 dB/decade slope up to 65 GHz and the modeled U exhibits the same slope up to  $f_{max}$ . Peak  $f_{max}$  of  $\approx$ 180 GHz and  $\approx$ 190 GHz are achieved for these power devices.

It is difficult to achieve  $f_{max}$  for power devices that is similar to that of smaller devices due to layout challenges [49]. For reference, our measurements reveal that a  $\frac{1 \ \mu m \times 10}{40 \ nm}$  device achieves an  $f_{max}$  of  $\approx 250$  GHz in this technology [46]. The use of a compact device layout with a continuous array of large number of gate fingers with large finger width reduces the parasitic capacitance and



Figure 2.12: Measured (extrapolated) (a)  $f_{max}$  and (b)  $f_T$  of  $\frac{2.793 \ \mu m \times 41}{40 \ nm}$  and  $\frac{2.793 \ \mu m \times 73}{40 \ nm}$  power devices in IBM 45 nm SOI CMOS across current density. These devices are used in designing the 2-stacked and 4-stacked PAs respectively. (c) Measured and simulated  $f_{max}$  for a  $\frac{3 \ \mu m \times 50}{60 \ nm}$  65 nm low-power bulk-CMOS power device across current density.

causes the 204  $\mu$ m device to have a higher  $f_T$  compared to the 115  $\mu$ m device However, the layout also results in an increased gate resistance and lower  $f_{max}$  for the larger device. This prevents the devices of the 4-stacked PA from being driven into a hard-switching condition, as will be discussed later in this chapter. Splitting the overall device into several smaller devices (each with reduced finger width and small number of gate fingers) wired appropriately in parallel should further improve the  $f_{max}$  to approach 250 GHz and available gain [49]. It should be noted that such a multiplicitybased layout approach might compromise  $f_T$  due to increased wiring capacitance. In a switch-like PA, a good balance between  $f_T$  and  $f_{max}$  must be maintained. A similar device layout approach and modeling strategy is employed for power devices in IBM's 65 nm low-power bulk-CMOS technology. A key point of difference, however, is the presence of IIR modeling within the PDK, eliminating the need for an external IIR resistance. A  $\frac{3 \ \mu m \times 50}{60 \ nm}$  power device test structure is measured using the same approach as mentioned earlier (Fig. 2.12(c)). A peak  $f_{max}$  of approximately 180 GHz is observed in measurement. It should, however, be noted that while power devices in 65 nm low-power bulk-CMOS are able to achieve similar  $f_{max}$  to power devices in 45 nm SOI CMOS, the width-normalized ON-resistance (quantified as  $\overline{R_{ON}}$  in Table 2.2) is almost 3 times higher for the same gate drive level due to the high threshold voltage of the low-power process ( $V_{th}$ =560 mV at the PA bias point). As will be demonstrated experimentally, this leads to inferior performance for mmWave Class-E-like PAs in 65 nm low-power bulk-CMOS.

#### 2.5.7 Implementation Details



Figure 2.13: Schematics of 45 nm SOI CMOS Q-band Class-E-like PAs with (a) 2 devices stacked and (b) 4 devices stacked.

The schematics in Fig. 2.13(a) and (b) depict the Class-E-like PAs implemented by stacking 2 and 4 floating-body devices in 45 nm SOI CMOS technology. Device sizes and dc-feed inductance values are chosen based on the theoretical analysis, while supply and gate bias voltages and gate capacitor values are selected based on the considerations described earlier. For the first stacked device ( $M_2$  in both designs), the gate voltage must be held to a constant bias as discussed previously. This can be accomplished through a large bypass capacitor placed as close as possible to the gate to mitigate stray inductance that can result in oscillations. DGNCAPs (which are device capacitors) are suitable for this purpose since their wiring is in the lowest metal layer and they provide higher capacitors for the higher stacked devices, which are not large in value, are implemented using VNCAPs. For both the designs, the output harmonic filter is eliminated to avoid passive loss with minimal impact on performance.

A key requirement for true Class-E behavior of the stack is for the intermediary drain nodes to sustain Class-E-like voltage swings with appropriately scaled amplitudes. This also ensures that the voltage stress is shared equally among all devices. Since at mmWave frequencies usually only the bottom device is driven in a stacked configuration ([30], [35], [5]), we rely on the voltage swing of the lower device(s) to turn off the device(s) higher up the stack. Once the stacked device turns off, the voltage of the intermediary node ceases to increase as the stacked device no longer conducts current to charge the parasitic capacitance at the intermediary node. This deviation of the intermediate node waveform from the desired voltage profile results in unequal voltage stress across the devices and deteriorates the overall efficiency owing to conduction loss during the initial period of the OFF half-cycle [42].

As was mentioned earlier, a tuning inductor may be placed at intermediary nodes to improve their voltage swing and make them more Class-E-like. Simulation results indicate that the improvement in swing for the 2-stacked PA is offset by an increase in the conduction loss of the top device. This can be explained as follows. The voltage swing at the intermediate node controls the turn-on and turn-off of the top device. As shown in Fig. 2.14(a), in the absence of the tuning inductor, the intermediate node voltage gets clipped to  $V_{g2} - V_{th2}$  once the top device turns off during the OFF half-cycle [26]. The voltage remains unchanged at  $V_{g2} - V_{th2}$  till the end of the OFF half-cycle, when the drain voltage of the top device reduces to  $V_{g2} - V_{th2}$  and the top and



Figure 2.14: Simulated voltage profiles for 2-stacked Class-E-like PA (a) without tuning inductor, and (b) with tuning inductor. (c) Close-up of voltage profiles with (bottom) and without (top) tuning inductor.

bottom node voltages roll-off in tandem thereafter. Introducing an inductor at the intermediate node results in a Class-E-like voltage profile (Fig. 2.14(b)), which causes the top device to turn back on earlier during the latter part of the OFF half-cycle, as shown in Fig. 2.14(b). This leads to additional power loss in the top device. Consequently, no tuning inductor is used in designing the 2-stacked PA. For the 4-stacked PA, a tuning inductor at  $V_{d2}$  is seen to provide benefit. Intuitively, a 4-stacked configuration can be viewed as a stack of two 2-stacked PAs with the inductor serving as an inter-stage tuning element.

Fig. 2.15(b) shows the drain waveforms for the 4-stacked PA. As is evident, drain-source voltage swings are almost equally shared across all four devices. The lack of a tuning inductor at  $V_{d1}$  results in a relatively flat-topped waveform. This is to be expected, in view of the foregoing discussion for the 2-stacked PA. The situation is somewhat different for node  $V_{d3}$ . Despite the absence of a tuning inductor, we can observe a Class-E-like waveform even when device  $M_4$  is off. This is a consequence



Figure 2.15: Simulated drain-source and gate-source voltage waveforms of the Q-band (a) 2-stacked Class-E-like PA ( $V_{g1}$ =0.4 V,  $V_{g2}$ =1.7 V,  $V_{DD}$ =2.4 V) and (b) 4-stacked Class-E-like PA in 45 nm SOI CMOS ( $V_{g1}$ =0.4 V,  $V_{g2}$ =1.8 V,  $V_{g3}$ =2.8 V,  $V_{g4}$ =4 V,  $V_{DD}$ =4.8 V).

of capacitive coupling through  $C_{gs}$  and  $C_{gd}$  of  $M_4$  (in conjunction with capacitive voltage division due to presence of the 80 fF gate capacitor), which induces voltage swing at  $V_{d3}$  when  $M_4$  is not conducting. This eliminates the need for a tuning inductor at  $V_{d3}$ . A similar voltage coupling does occur for  $V_{d2}$  as well. However, in that case the coupling is through two levels of devices and the resulting series connection of intrinsic capacitances reduces the strength of the voltage coupled to  $V_{d2}$ . Since  $M_1$  and  $M_2$  can be viewed as a 2-stacked PA with the tuning inductor serving as the choke inductance in large signal, a tuning inductor is not required at  $V_{d1}$  (as discussed before).

Another technique for inducing voltage swing at the intermediary nodes in a stacked configuration is through the use of capacitive charging acceleration. The work in [42] describes two methods for accomplishing this. The first is by placing an explicit capacitor between every pair of intermediary nodes and the second is using the inherent drain-bulk capacitance of a device by connecting the bulk and source nodes of stacked devices along with appropriate device sizing. The first method is less desirable at mmWave frequencies owing to the poor quality factor of on-chip capacitors. The second method is applicable only when the body terminal of the device is explicitly available to the designer. Furthermore, the efficacy of such an approach would depend on accurate modeling of the characteristics of the source-bulk junction. For the 45 nm SOI implementations, the body of the floating-body devices is not accessible. Inductors, on the other hand, have better quality factor than capacitors at mmWave frequencies. Therefore, in the implemented PAs, inductive tuning is preferred to the capacitive feed-forward technique.



Figure 2.16: Schematic of the two-stage 45 nm SOI CMOS Q-band Class-E-like PA with a 2-stacked driver stage and a 4-stacked main PA.

The lack of square-wave drive at mmWave frequencies results in soft-switching which increases the input power required to drive the PAs into a hard-switching state. Thus, to ensure that the PAs are driven into saturation, it is imperative to include a driver stage when delivering high output power. A third prototype (Fig. 2.16) was designed in 45 nm SOI CMOS by cascading the 2-stacked and 4-stacked designs discussed previously. The 2-stacked PA thus serves as the driver for the 4-stacked main PA, with an inter-stage matching network transforming the input impedance of the main PA to the optimal 50  $\Omega$  load desired by the driver stage.



Figure 2.17: Schematic of differential 2-stacked Class-E-like PA implemented in 65 nm low-power bulk CMOS.

In order to demonstrate the benefit of scaled SOI technology over bulk CMOS for implementing stacked PAs, a prototype 2-stacked PA was implemented in IBM 65 nm CMOS technology. The



Figure 2.18: Post-layout simulated drain-source voltages and corresponding switch currents for (a) 2-stacked PA and (b) 4-stacked PA in 45 nm SOI CMOS.

schematic of the pseudo-differential 2-stacked Class-E-like PA is shown in Fig. 2.17. The design strategy is similar to that of the single-ended 2-stacked PA in 45 nm SOI CMOS discussed previously. The differential input and output terminals are routed directly to GSSG pads for probing. A pseudo-differential structure was chosen to facilitate an increase in the overall output power.

An important characteristic of switching PAs, which sets them apart from the linear classes,



Figure 2.19: Comparison of post-layout simulated waveforms for device  $M_2$  of the 2-stacked PA in 45 nm SOI CMOS with theory.



Figure 2.20: Comparison of post-layout simulated waveforms for device  $M_4$  of the 4-stacked PA in 45 nm SOI CMOS with theory.

is the non-overlapping nature of switch voltage and switch current waveforms and the high harmonic content of these waveforms compared to linear PAs. In a device-based implementation, it is difficult to isolate the current flowing through the device capacitances from that flowing through the "switch". As a first order approximation, the currents through the external wiring parasitic capacitances  $C_{gs}$ ,  $C_{gd}$ ,  $C_{ds}$  and  $C_{d0}$  are scaled in proportion to the ratio of the intrinsic to external wiring parasitic capacitance, and their sum is subtracted from the total device current to arrive at the switch-current in simulation. Fig. 2.18 shows the  $V_{DS}$  and the corresponding  $I_{switch}$  for the various devices in the 2-stacked and 4-stacked PAs implemented in 45 nm SOI CMOS, from which the non-overlapping characteristic of voltage and current waveforms is clearly evident. Figs. 2.19 and 2.20 compare the switch voltage and switch current waveforms for devices  $M_2$  and  $M_4$  of the 2-stacked and 4-stacked PAs respectively with theory. Aside from the sharp current spikes in the theoretical waveforms at switch turn-on, there is excellent correspondence between theory and simulation. The current spikes arise from the assumption of hard-switching, which is not possible at mmWave. However, the soft-switching in simulation does not compromise the shaping of voltages and currents and their harmonic content for the rest of the switching cycle. These results clearly indicate the feasibility of switching operation at mmWave frequencies.

As mentioned before, the theoretical loss-aware Class-E design methodology approximates the stacked configuration as a series connection of switches and assumes that appropriate voltage swings are somehow ensured at the intermediate nodes. A more realistic model for the circuit is a stack of switches, each accompanied by the corresponding intrinsic device capacitances ( $C_{gs}$ ,  $C_{gd}$ ,  $C_{ds}$ , and  $C_{d,ground}$ ) and gate capacitors (except for the bottom switch). An 'Elmore network' of RC delays is encountered in stacked linear PAs (owing to the simultaneous presence of capacitances and finite device output resistances) which can cause phase shift in the voltages and currents as one moves up the stack. It is unclear that there would be a similar 'Elmore delay' in stacked switching PAs, since during the OFF cycle, the switch devices have very high OFF resistance. However, the capacitive discharge loss at intermediate nodes might be non-negligible and can have a considerable influence on overall efficient operation of the PA. Ignoring these losses in the theoretical analysis results in PAE higher than what is obtained from actual device-based simulations.

For linear stacked PAs, the impact of 'Elmore delay' can be accounted (and compensated) for theoretically [4] since a small signal model for the devices is used for the preliminary analysis and design procedure. A similar endeavor for stacked switching PAs is challenging owing to non-linear operation of the circuit. Consequently, we adopt a simulation-based approach to investigate this effect for a 4-stacked configuration (without the tuning inductor) using a switch+capacitor based model for the devices. The resulting circuit resembles that in Fig. 2.13(b) (sans the tuning inductor, the input matching network and with a square-wave drive instead of a sinusoidal input), with each device modeled as a switch augmented with intrinsic device capacitances. Layout parasitics (capacitors, resistors and inductances) based on Fig. 2.11 are also incorporated in the circuit to



Figure 2.21: (a) Post-layout simulated device voltages for the 4-stacked PA prototype without tuning inductor in 45 nm SOI CMOS. (b) Simulated switch voltages for the same 4-stacked configuration without tuning inductor, using a switch+capacitor based model for the devices and layout parasitics from Fig. 2.11.

facilitate better correlation with device-based results. The delay in the switch-voltage waveforms exhibit close correspondence with those obtained from device-based simulations as well, as shown in Fig. 2.21. Since the voltage profiles confirm no significant delay, the phenomenon of 'Elmore delay' is not a concern in stacked switching PAs. The resulting output power and PAE are reported in Fig. 2.7(a). The reduction in efficiency ( $\approx 20\%$ ) for the switch+capacitor based model indicates that capacitive discharge loss at intermediate nodes is a more important practical consideration. These results, in conjunction with the comparison presented in Fig. 2.20 indicate that ignoring 'Elmore delay' and the additional loss mechanisms in the theoretical analysis is not a crippling limitation since it does not significantly alter the waveform characteristics and hence the impact on switching behavior of the stacked configuration at mmWave frequencies. One would therefore obtain output power similar to that predicted by theory, but at a lower PAE which follows the theoretical trend (Fig. 2.7(a)). This also demonstrates the efficacy of a simple switch+capacitor based model to predict the performance of a practical implementation.



Figure 2.22: Chip microphotographs of the millimeter-wave stacked Class-E-like PAs with (a) 2 devices stacked in 45 nm SOI CMOS, (b) 4 devices stacked in 45 nm SOI CMOS, (c) a two-stage cascade of a main PA with 4 devices stacked with a 2-stacked driver stage in 45 nm SOI CMOS and (d) 2 devices stacked in 65 nm low-power bulk CMOS.

# 2.5.8 Experimental Results

The chip microphotographs of the PAs are shown in Fig. 2.22. The PAs are tested in chip-on-board configuration through on-chip probing using two coaxial 1.85 mm (dc-65 GHz) ground-signal-ground (GSG) probes.


Figure 2.23: Small signal S-parameters of 45 nm SOI 2-stacked Class-E-like PA ( $V_{g1}=0.4$  V,  $V_{g2}=1.7$  V,  $V_{DD}=2.4$  V). Power consumption=49 mW.

### 2.5.8.1 Small Signal Measurements

The small-signal measurement setup is calibrated at the probe tip planes. The small-signal measurements are conducted up to 65 GHz using an Anritsu 37397E Lightning VNA. Figs. 2.23 and 2.24 illustrate the simulated and measured small signal S-parameters of the 2-stacked PA and the



Figure 2.24: Small signal S-parameters of 45 nm SOI 4-stacked Class-E-like PA ( $V_{g1}=0.4$  V,  $V_{g2}=1.8$  V,  $V_{g3}=2.8$  V,  $V_{g4}=4$  V,  $V_{DD}=4.8$  V). Power consumption=206 mW.

4-stacked PA implemented in 45 nm SOI CMOS. The measured peak gain of the 2-stacked PA is 13.5 dB at 46 GHz, with a -3 dB bandwidth extending from 32 GHz to 59 GHz. The -1 dB bandwidth extends from 42 GHz to 52 GHz, making it suitable for wideband applications. The measured peak gain of the 4-stacked PA is 12.3 dB at 48.5 GHz, with a -3 dB bandwidth extending from 37

GHz to 56 GHz. The measured -1 dB bandwidth spans a wide frequency range from 43.5 GHz to 52.5 GHz. A frequency shift of  $\approx$ 3-5 GHz is observed between measured and simulated curves for both PAs in both  $S_{11}$  and  $S_{21}$ . This can probably be attributed to over-estimation of capacitive parasitics at design time. The fact that the PAs have a significant small-signal gain goes against the concept of conventional Class-E PA design, but is simply an outcome of the "Class-E-like" design methodology described in this work. The PA is designed for optimum performance at a Class-E input drive level, at which point the devices can be regarded as hard-switching. However, at the dc bias point, the devices are biased somewhat above the threshold voltage, imparting the circuit with small-signal gain. Of course, this gain is less than the maximum gain available from the stacked devices as the output load is designed based on Class-E principles. A modified version of the 45 nm SOI 4-stacked PA, obtained by laser-trimming the tuning inductor, was also characterized, and its simulated and measured small signal S-parameters are reported in Fig. 2.25. The measured peak gain is 11.6 dB at 45 GHz, with a -3 dB bandwidth extending from 30 GHz to 55 GHz. The -1 dB bandwidth extends from 36 GHz to 50 GHz.

The measured and simulated small-signal S-parameters of the 2-stacked differential PA implemented in 65 nm bulk CMOS are illustrated in Fig. 2.26. The measurement setup and calibration procedure are the same as discussed before, with the exception that coaxial 1.85 mm coplanar wave ground-signal-signal-ground (G-S-S-G) probes are used for the measurements. However, one probe of each differential pair is terminated with 50  $\Omega$ , so in essence single-ended measurements are being performed. This stems from the practical challenges in creating a differential mmWave measurement setup. The measured peak gain is 9.5 dB at 47 GHz with a -1 dB bandwidth extending from 44 GHz to 50 GHz.

#### 2.5.8.2 Large Signal Measurements

The large signal measurement setup for the aforementioned PAs is shown in Fig. 2.27. The large signal characteristics of the 45 nm SOI PAs are shown in Fig. 2.28(a) and Fig. 2.29. Measurement results yield a peak PAE of 34.6% for the 2-stacked PA with a saturated output power of 17.6 dBm at 47 GHz. Compared to the cascode PA in [45] operating at a similar frequency at supply voltages close to the nominal single-device  $V_{DD}$  of the technology, the 2-stacked PA has measured saturated higher output power along with 10-12% higher PAE. The 4-stacked PA has measured saturated



Figure 2.25: Small signal S-parameters of 45 nm SOI 4-stacked Class-E-like PA with the tuning inductor eliminated using laser trimming ( $V_{g1}$ =0.4 V,  $V_{g2}$ =1.8 V,  $V_{g3}$ =2.8 V,  $V_{g4}$ =4 V,  $V_{DD}$ =4.8 V). Power consumption=206 mW.

output power of 20.3 dBm at 47.5 GHz at a peak PAE of 19.4%. For the trimmed version of the 4-stacked PA without the tuning inductor, a peak PAE of 18.3% was achieved at 42.5 GHz along with a saturated output power of 20.3 dBm. Unlike the 2-stacked PA, the measured performance



Figure 2.26: Small signal S-parameters (single-ended) of the 65 nm differential 2-stacked Class-Elike PA ( $V_{g1}=0.8 \text{ V}, V_{g2}=2 \text{ V}, V_{DD}=2 \text{ V}$ ). Power consumption=89 mW under small signal operation.

metrics of the 4-stacked PAs (particularly efficiency) are somewhat lower than those predicted by simulations. This is an indication of unmodeled active losses, as there is good correspondence between the measured and simulated characteristics of the various passive components [35]. The loss in the active components depends on a proper choice of device layout as well as accurate



Figure 2.27: Large signal Q-band measurement setup for the fabricated PAs.

modeling, as discussed in Section 2.5.6.

Large signal measurements were also conducted for the 2-stacked and 4-stacked PAs across frequency (at the optimal bias point) and for different supply voltages (at a fixed frequency, keeping gate biases constant). The results are depicted in Figs. 2.30 and 2.31. Large signal measurement beyond 48 GHz was limited by the characteristics of the measurement equipment (specifically, the Quinstar PA used to drive the PAs under test, as well as the isolator, dual-directional coupler and the power sensors used in the measurement setup). Unlike the 2-stacked PA, the output power does not increase with increasing supply voltage for the 4-stacked prototype. Once again, this can probably be attributed to the device layout discussed previously that results in lower  $f_{max}$ .

This hypothesis is tested in measurement. As discussed previously, an important characteristic of switching PAs is linearity with respect to supply voltage, which causes the average supply current and the output power to scale linearly and quadratically with supply voltage respectively. This unique feature distinguishes switching PAs from the class of linear PAs. At mmWave frequencies, the various sources of non-idealities result in deviation from ideal Class-E characteristics. Thus, the scaling of supply current and output power with supply voltage can be utilized as a useful metric to determine the extent of switching characteristics of a PA in the mmWave regime. Fig. 2.32 illustrates the measured average supply current and saturated output power of the 2-stacked PA in 45 nm SOI CMOS as a function of  $V_{DD}$  and  $V_{DD}^2$  respectively. The respective linear trends can be clearly observed, thereby corroborating the Class-E-like nature of the design. This also proves



Figure 2.28: Measured gain, drain efficiency and PAE as a function of output power for (a) the 45 nm SOI 2-stacked Class-E-like PA at 47 GHz ( $V_{g1}=0.4$  V,  $V_{g2}=1.7$  V,  $V_{DD}=2.4$  V) and (b) the 65 nm differential 2-stacked Class-E-like PA at 47.5 GHz ( $V_{g1}=0.8$  V,  $V_{g2}=2.1$  V,  $V_{DD}=2.8$  V).



Figure 2.29: Measured gain, drain efficiency and PAE as a function of output power for (a) the 45 nm SOI 4-stacked Class-E-like PA with the tuning inductor eliminated through laser trimming at 42.5 GHz and (b) the 45 nm SOI 4-stacked Class-E-like PA at 47.5 GHz ( $V_{g1}$ =0.4 V,  $V_{g2}$ =1.8 V,  $V_{g3}$ =2.8 V,  $V_{g4}$ =4 V,  $V_{DD}$ =4.8 V for both designs).



Figure 2.30: Measured gain, saturated output power, drain efficiency and PAE (a) across frequency  $(V_{g1}=0.4 \text{ V}, V_{g2}=1.7 \text{ V}, V_{DD}=2.4 \text{ V})$  and (b) across supply voltage at 47 GHz of the 45 nm SOI 2-stacked Class-E-like PA  $(V_{g1}=0.4 \text{ V}, V_{g2}=1.7 \text{ V})$ .

that switch-like PAs can indeed be implemented at mmWave frequencies with appropriate design methodology. The corresponding results for the 4-stacked PA with the tuning inductor are shown

![](_page_81_Figure_1.jpeg)

Figure 2.31: Measured gain, saturated output power, drain efficiency and PAE (a) across frequency  $(V_{g1}=0.4 \text{ V}, V_{g2}=1.8 \text{ V}, V_{g3}=2.8 \text{ V}, V_{g4}=4 \text{ V})$  and (b) across supply voltage at 47 GHz of the 45 nm SOI 4-stacked Class-E-like PA.

in Fig. 2.33. The 4-stacked PA's measured characteristics deviate from expected trends. This indicates that the devices are not being driven to a hard-switching condition, likely due to reduced

![](_page_82_Figure_1.jpeg)

Figure 2.32: Measured and expected (a) average supply current vs  $V_{DD}$  and (b) saturated output power vs  $V_{DD}^2$  for 2-stacked Class-E-like PA in 45 nm SOI CMOS. The profiles display the linearity with respect to supply voltage associated with switching Class-E PAs, thereby establishing the Class-E-like characteristics of the PA even at mmWave frequencies.

device  $f_{max}$ . It should be noted that [30] and [5] have realized large power devices at mmWave with high  $f_{max}$  and hence the foregoing results for the 4-stacked PA should not be taken to mean that Class-E operation is not possible for high levels of stacking.

The small-signal S-parameters and large-signal performance metrics of the two-stage PA implemented in 45 nm SOI CMOS are summarized in Fig. 2.34. The measured peak gain is 24.9 dB at 51 GHz while a peak PAE of 15.4% was achieved at 47 GHz along with a saturated output power of 20.1 dBm.

Large-signal measurements of the 2-stacked differential PA implemented in 65 nm bulk CMOS yield a peak PAE of 28.3% with a saturated output power of 15.2 dBm at 47.5 GHz (Fig. 2.28(b)), implying a saturated differential output power of 18.2 dBm. The lower efficiency of this PA compared to the 2-stacked 45 nm SOI PA stems from the higher ON-resistance of the 65 nm devices.

![](_page_83_Figure_1.jpeg)

Figure 2.33: Measured and expected (a) average supply current vs  $V_{DD}$  and (b) saturated output power vs  $V_{DD}^2$  for 4-stacked Class-E-like PA in 45 nm SOI CMOS. The profiles do not display the linearity with respect to supply voltage characteristic of switching Class-E PAs, owing to layoutinduced increased gate resistance which prevents hard-switching at mmWave frequencies.

# 2.6 Challenges Associated with Device Stacking in Class-E PAs

As was discussed in Section 2.5.7, in order to achieve true Class-E behavior and equal sharing of voltage stress in a stacked configuration, all the intermediary drain nodes should sustain Class-E-like voltage swings with appropriately scaled amplitudes. The two most popular techniques for addressing this issue are (i) the inductive tuning technique, namely placing a shunt inductor  $(L_{mid})$  at the intermediary node(s) [26] and (ii) the charging acceleration technique (CAT) [42] which utilizes feed forward capacitive coupling. The inductive tuning approach suffers from several shortcomings at mmWave frequencies. Firstly, a series DC blocking capacitor is required, which will contribute loss owing to poor quality factor of on-chip capacitors at mmWave frequencies. The tuning inductor can consume considerable die area, unless special design techniques such as transformer-based charging-acceleration [50] are utilized. In addition, the finite quality factor of the tuning inductor would contribute to power loss. Furthermore, the circuit is quite sensitive to the choice of the tuning inductor as discussed in [4], where laser trimming was employed to optimize its value. The alternative charging acceleration technique works well at low RF frequencies, but

![](_page_84_Figure_1.jpeg)

Figure 2.34: (a) Measured small signal S-parameters and (b) measured gain, drain efficiency and PAE as a function of output power for the two-stage 45 nm SOI PA comprising a 4-stacked main PA and a 2-stacked driver stage at 47 GHz ( $V_{g1}=0.4$  V,  $V_{g2}=1.7$  V,  $V_{g3}=2.8$  V,  $V_{g4}=4$  V,  $V_{DD,1}=4.8$  V,  $V_{g5}=0.4$  V,  $V_{g6}=1.6$  V,  $V_{DD,2}=2.4$  V). Power consumption=255 mW under small signal operation.

the poor quality and self-resonance of on-chip MIM/interdigitated capacitors used to implement the feed-forward capacitor would degrade efficiency at mmWave frequencies. The following section describes a new means of achieving appropriate voltage swing(s) at the intermediary node(s) for Class-E PAs employing device stacking. A "Class-E load network" is connected at each intermediary node, resulting in a topology referred to as the Multi-output Stacked Class-E PA that amounts to stacking multiple single-device Class-E PAs while retaining their individual characteristics (Fig. 2.35(a)). Theoretical discussions regarding the unique features of this topology along with measurement results of two Q-band prototypes in IBM's 45nm SOI CMOS technology demonstrating the concept are presented as well.

## 2.7 Multi-output Stacked Class-E PA

![](_page_85_Figure_3.jpeg)

Figure 2.35: (a) Multi-output Stacked Class-E PA and (b) corresponding simplified switch-based schematic with drain voltage swings for lossless operation.

#### 2.7.1 Principle of Operation

The proposed Multi-output Stacked Class-E PA topology is based on the key observation that the drain voltage profile in a Class-E PA is facilitated by the presence of the "Class-E load network". Extending this idea to the case where several devices are stacked, it is evident that incorporating an appropriately tuned Class-E load network at each intermediary node would result in Class-E-like voltage swings for all devices (Fig. 2.35(a)). Each device thus behaves as an independent Class-E entity. Another important characteristic of the proposed topology is that output power is available from each intermediary node, which had formerly been used only to turn off devices higher up in the stack. The multiple output nodes can be power-combined *internally* to drive a single load with increased power or can be used to drive other circuit blocks, making the proposed topology useful as an active power splitter.

In order to facilitate theoretical analysis, we resort to the simplified schematic in Fig. 2.35(b) with the drain voltage swings for lossless operation annotated. The devices are represented by switches with output capacitance  $C_{out,i}$  and corresponding ON-resistance  $R_{ON,i}$  (i = 1, 2, ...n), each driven by a square wave input with 50% duty-cycle. The calculation of output capacitance  $C_{out,i}$  at the i<sup>th</sup> intermediary node is not straightforward owing to the complex capacitive network formed by the device capacitances. However, one might use an approximate expression as follows:

$$C_{out,i} \approx C_{d0,i} + C_{gd,i} + C_{gs,i+1}, \ i \ \epsilon \ [1, n-1]$$
 (2.28)

and 
$$C_{out,n} \approx C_{d0,n} + C_{gd,n}$$
 (2.29)

where  $C_{d0,i}$ ,  $C_{gd,i}$ ,  $C_{gs,i}$  are respectively the drain-to-ground, gate-drain and gate-source capacitances for the i<sup>th</sup> device. The above approximation is based on the observation that at the drain terminal of the i<sup>th</sup> device, in addition to  $C_{d0,i}$  (which is the drain-to-ground capacitance), the capacitance seen looking up the stack is  $C_{gs,i+1}$  in the worst case (assuming that the externally added gate capacitor  $C_{g,i}$  is relatively large). Similarly, the capacitance seen looking down the stack results in a worst-case value of  $C_{gd,i}$ .

The ideal operation of the Multi-output Class-E topology can be understood by analyzing the ON and OFF states in the absence of losses i.e.  $R_{ON,i}=0$  (i = 1, 2, ...n). As shown in Fig. 2.36(a), in the ON state, the drain terminal of each switch is pulled down to ground, so that in effect, we have *n* independent Class-E PAs, each operating in its ON state. It should be noted that in this ON

![](_page_87_Figure_1.jpeg)

Figure 2.36: (a) ON state operation and (b) OFF state operation of lossless Multi-output Stacked Class-E PA.

state, the switches that are lower in the stack must support the currents of the Class-E PAs higher in the stack, potentially increasing their conduction loss when finite conduction loss is considered. Similarly, during the OFF state (Fig. 2.36(b)), each switch is "open", so that once again we have n independent Class-E PAs operating in their respective OFF states. Consequently, we have true Class-E behavior for the overall stacked topology. The individual Class-E load networks can thus be designed for global waveform shaping to optimize for output power and/or efficiency of the stacked configuration, as will be discussed subsequently. In a practical implementation, finite switch loss will introduce interaction between the stacked devices in the ON state resulting in deviation from independent Class-E behavior. However, assuming low-loss operation and designing the Class-E load networks accordingly provides an excellent starting point for subsequent simulation-based optimization. Depending on the "tuning" of the Class-E load network [40] of each stacked device, the corresponding supply voltage should be chosen so that the maximum instantaneous drain-source voltage swing of each device is  $V_{reliability}$  [26]. As discussed in [40], the waveform figure of merit

$$F_V = V_{peak} / V_{DD} \tag{2.30}$$

relates the peak drain voltage swing  $V_{peak}$  to the DC supply voltage  $V_{DD}$  of a single-device Class-E PA. The peak drain voltage swings in a stacked configuration will increase linearly with the number of devices stacked so that the total voltage stress is evenly distributed across the stack. The supply voltage  $V_{DD,i}$  for the  $i^{th}$  stacked device must be chosen accordingly i.e.  $V_{DD,i} = n \times \frac{V_{reliability}}{F_{V,i}}$ .

#### 2.7.2 Efficiency Analysis

The presence of device conduction loss, modeled by the corresponding switch ON-resistance  $R_ON$ , results in deviation from ideal operation of the proposed Multi-output Class-E topology. A comprehensive analysis of optimal tuning for single-device Class-E PAs in presence of significant conduction loss was discussed in [1]. Extension of this analysis to the Multi-output Class-E PA is possible, but the equations are too complex to provide practical design guidelines due to the interaction between the switches in the presence of significant conduction loss (the switches lower in the stack support the current of those above). A simplified analysis is performed here to gain intuitive understanding of important factors affecting overall efficiency. Referring to Fig. 2.35(b) we use the notations  $I_{L,k}$  and  $i_k cos(\omega_0 t + \phi_k)$  to denote the DC-feed inductor current and the load network current respectively for the  $k^{th}$  switch in the stack. Let

$$I_k = I_{L,k} - i_k \cos(\omega_0 t + \phi_k) \quad k = 1, 2, \dots n$$
(2.31)

The current through the  $k^{th}$  switch during the ON half-cycle is then given by

$$i_{s,k} = \sum_{m=k}^{n} I_m \quad k = 1, 2, ..n.$$
(2.32)

Eqn. (2.32) culminates in some important observations. Firstly, a switch supports the load and DCfeed inductor currents of all the switches higher up in the stack, in addition to its own load network current. Thus, the bottom device supports the largest current and it is imperative to minimize it's ON-resistance to maximize efficiency. However, too large a device size would increase input power and degrade PAE, thereby resulting in a trade-off in device size. Secondly, the current flowing through the switches decreases up the stack so it is possible to taper the device size progressively. Finally, the different Class-E load networks (and consequently their currents) can be potentially tuned to shape the switch currents to further minimize conduction loss.

The conduction loss of the  $\mathbf{k}^{th}$  switch is given by

$$P_{loss,k} = i_{s,k,RMS}^{2} \times R_{ON,k}$$

$$= \frac{1}{T_{s}} \int_{0}^{T_{s}/2} i_{s,k}^{2} \times R_{ON,k}$$

$$= \frac{1}{T_{s}} \int_{0}^{T_{s}/2} \left(\sum_{m=k}^{n} I_{m}\right)^{2} \times R_{ON,k}$$
(2.33)
(2.34)

where  $i_{s,k,RMS}$  is the RMS current flowing through the k<sup>th</sup> switch.

The drain efficiency can therefore be expressed as

$$\eta = 1 - \frac{\sum_{k=1}^{n} P_{loss,k}}{\sum_{k=1}^{n} P_{DC,k}} = 1 - \frac{\sum_{k=1}^{n} i_{s,k,RMS}^{2} \times R_{ON,k}}{\sum_{k=1}^{n} V_{DD,k} \times I_{DC,k}}$$
(2.35)

where  $T_s$  is the switching period, the ON half-cycle is assumed to be from t=0 to t= $T_s/2$  and  $I_{DC,k}$  is the steady-state DC current drawn by the k<sup>th</sup> switch from its supply voltage  $V_{DD,k}$ .

In order to gain better insight into the design trade-offs, we resort to the waveform figures of merit as discussed in Section 2.5.3:

$$F_{I,k} = \frac{i_{s,k,RMS}}{I_{DC,k}} \tag{2.36}$$

$$Z_{C,k} = \frac{1}{\omega_0 \times C_{out,k}} \tag{2.37}$$

and 
$$F_{C,k} = \frac{P_{DC,k}}{\frac{V_{DD,k}^2}{Z_{C,k}}}$$
 (2.38)

where  $\omega_0$  is the operating frequency in rads/sec. It should be noted that the metric  $F_{I,k}$  (related to the shape of the current waveform) depends on the tuning of all devices above the k<sup>th</sup> device (unlike a single-device PA or a stacked PA with no intermediary tuning networks) while  $F_{C,k}$  depends on the tuning of the k<sup>th</sup> device. Using these, eqn. 2.35 can be re-written as

$$\eta = 1 - \frac{\sum_{k=1}^{n} \left( F_{I,k} \times I_{DC,k} \right)^2 \times R_{ON,k}}{\sum_{k=1}^{n} F_{C,k} \times \frac{V_{DD,k}^2}{Z_{C,k}}}$$
(2.39)

Since  $P_{DC,k} = V_{DD,k} \times I_{DC,k}$ , we can rewrite eqn. 2.38 as

$$I_{DC,k} = F_{C,k} \times \frac{V_{DD,k}}{Z_{C,k}} \tag{2.40}$$

Substituting in eqn. 2.39, we get

$$\eta = 1 - \frac{\sum_{k=1}^{n} \left( F_{I,k} \times F_{C,k} \times \frac{V_{DD,k}}{Z_{C,k}} \right)^2 \times R_{ON,k}}{\sum_{k=1}^{n} F_{C,k} \times \frac{V_{DD,k}^2}{Z_{C,k}}}$$
(2.41)

From the foregoing expression, it is clear that the overall efficiency is determined by the relative tunings of the Class-E load networks (represented by  $F_{I,k}$  and  $F_{C,k}$  values) and by the devicesize tapering (represented by  $Z_{c,k}$  values). Consequently, there are multiple optimization variables which can be chosen to tailor the output powers from the individual load networks while ensuring the best possible efficiency. This possibility of global waveform engineering is in contrast to a single-device Class-E PA where simply minimizing  $F_I$  and  $F_C$  is desirable.

For the remainder of the chapter, we shall focus on a special case of the Multi-output Class-E PA with 2 devices stacked referred to as the Dual-Output Class-E PA. Switch-based simulations were conducted at 45GHz based on theoretical results to observe the impact of relative device sizing and tuning of Class-E load networks for the Dual-Output Class-E PA. The width of the top device denoted by  $W_2$  was fixed at 100 $\mu$ m, while that of the bottom device ( $W_1$ ) was varied along with the tunings of the respective Class-E load networks (given by  $n = \frac{1}{\omega_0 \times \sqrt{L_s C_{out}}}$ ). The tuning-dependent load impedances for each Class-E PA in the stack were determined based on the theoretically optimal load impedance that ensures Zero Voltage Switching (ZVS) and Zero Derivative of Voltage at Switching (ZdVS) under lossless operation [51]. The following parameters, obtained from device characterization in IBM 45nm SOI CMOS using body-contacted devices, were used for switch-based simulations:

$$C_{out,1} = (0.59fF/\mu m) \times W_1 + (0.28fF/\mu m) \times W_2$$
(2.42)

$$C_{out,2} = (0.59fF/\mu m) \times W_2 \tag{2.43}$$

$$R_{ON_{1,2}} = (347.8\Omega - \mu m)/W_{1,2} \tag{2.44}$$

The respective supply voltages were adjusted to ensure that the overall voltage stress is evenly distributed between the devices. Furthermore, ideal internal power-combining was assumed in these simulations.

![](_page_91_Figure_1.jpeg)

Figure 2.37: (a) Drain efficiency as a function of device sizes  $W_1$  and  $W_2$  and load network tuning  $(L_s \times C_{out})$  for Dual-Output Class-E PA using switch-based simulations at 45GHz, (b) efficiencies from device-based simulations and (c) comparison of output powers from switch-based and device-based simulations as a function of device size ratio  $W_1/W_2$  with  $W_2=100\mu$ m. (d) Variation of real and imaginary parts of load impedances for the top and bottom devices of the Dual-Output Class-E PA as a function of device size ratio  $W_1/W_2$ , with  $W_2=100\mu$ m, obtained from theoretical results at 45GHz using body-contacted device parameters in IBM 45nm SOI CMOS [1] (Note: Load impedance for the top device shows no variation since  $W_2$  remains unchanged).

From Fig. 2.37(a) it is evident that for a given ratio of device sizes, there exists an optimal tuning ratio for the respective Class-E load networks that maximizes drain efficiency. As expected, drain efficiency keeps improving with increasing size of the bottom device due to reduction in

conduction loss, though the incremental benefits diminish when  $W_1/W_2 \ge 4$ . Furthermore, PAE is a more relevant metric at mmWave frequencies and device-based simulations are used to evaluate the impact of these trade-offs on PAE.

Device-based simulations were conducted (with lossless passives) using body-contacted devices at 45GHz in 45nm SOI CMOS as a function of device size ratio  $W_1/W_2$  with  $W_2=100\mu$ m for the optimal tuning (i.e. tuning for highest PAE) in each case. Lossless power-combining is assumed as before and the load impedances for simulations are determined as in [51]. The drain efficiency and PAE for device-based simulations are shown in Fig. 2.37(b). The absolute value of drain efficiency differs from Fig. 2.37(a) owing to various non-idealities that are not accounted for in switch-based simulations. Fig. 2.37(b) shows that the PAE is maximum for a device size ratio ranging from 1:1 to 2:1. This is because input power increases for larger ratios. Fig. 2.37(c) compares the output powers generated by the top and bottom devices for both switch and device-based simulations. A good agreement is observed. Although PAE is practically the same (and maximum) for a device size ratio of 1:1 and 2:1, a sizing of 2:1 was chosen for the prototypes implemented in this work since the simulated output power was about 1.5 times higher. Fig. 2.37(d) depicts the load impedances for the top and bottom devices as a function of device size ratio. Device size ratios higher than 2:1 were thus avoided owing to lower PAE (due to high input power requirements) as well as steep impedance transformation requirements that would further degrade the overall PAE.

The foregoing results provide design guidelines for a desired output power and the associated impedance transformation considerations. In the foregoing analysis, the loss in the passive components was not taken into account. Incorporating passive losses, even in a perturbative fashion, would make the theoretical analysis intractable and is best left to the simulation-based design/optimization stage. Nevertheless, the theoretical results provide a good starting point for simulations.

## 2.7.3 Internal Power-combining

As mentioned earlier, the Multi-output topology can serve as a high-power high-efficiency active power-splitter with unequal division ratios that can be incorporated into the design procedure described earlier. As an alternative, the output powers available from the different intermediary nodes can be power-combined *internally* to drive a single load. In this work, we investigate internal

![](_page_93_Figure_1.jpeg)

Figure 2.38: Illustration of internal power-combining for Dual-Output Class-E PA (biasing details omitted). (a) Optimized load networks for the top and bottom devices. (b) The phase-shifts  $\phi_1$  and  $\phi_2$  introduced by the impedance transformation networks  $M_1$  and  $M_2$  respectively should ensure phase alignment at the transformed impedances  $R_A$  and  $R_B$  to ensure constructive power-combining at the single output node. (c) Single load =  $R_A \parallel R_B$  driven by output powers from the top and bottom devices. The single load is split between the individual load networks depending on the power levels prior to internal power-combining such that equal voltage amplitude  $V_1$  is produced across  $R_A$  and  $R_B$ .

power-combining and the ensuing design challenges and trade-offs for the Dual-Output Class-E PA. The concept of internal power-combining is illustrated in Fig. 2.38 and can be understood by traversing the figure from the left to the right. At each drain node, there is an optimal load network and corresponding output powers  $P_{out,1}$  and  $P_{out,2}$  for the bottom and top devices respectively (Fig. 2.38(a)). The single load (chosen to be 50 $\Omega$  here) is split into two parts  $R_A$  and  $R_B$  for the bottom and top devices respectively in the inverse ratio of the respective output powers i.e.

$$\frac{P_{out,1}}{P_{out,2}} = \frac{R_B}{R_A} \tag{2.45}$$

and 
$$R_A \parallel R_B = 50\Omega$$
 (2.46)

Impedance transformation networks  $M_1$  and  $M_2$  are then used to transform the optimal load impedances  $R_1 + jX_1$  and  $R_2 + jX_2$  to  $R_A$  and  $R_B$  respectively (Fig. 2.38(b)). If matching network loss is ignored, the amplitudes across  $R_A$  and  $R_B$  will be the same due to the choice of load resistances that are inversely related to the output powers. Equal phases can be ensured by choosing matching networks with similar topology and number of passive components. Another degree of freedom that helps in ensuring equal phases is the fact that the transformed impedances (earlier  $R_A$  and  $R_B$ ) can have parallel reactive parts so long as they cancel out on connecting the output nodes (or more precisely, add up to the pad capacitance). The relative output powers from the different Class-E load networks is another design degree of freedom that can be used to optimize efficiency and ease the design of the matching networks.

![](_page_94_Figure_2.jpeg)

Figure 2.39: (a) Feedback loop resulting from internal power-combining in the Dual-Output Class-E PA and (b) cascode PA where common-gate device mitigates feedback through  $C_{gd}$  and improves reverse isolation.

#### 2.7.4 Stability of Dual Output Class-E PA vs. Cascode PA

Internally power-combining the different output nodes of the Multi-output topology results in several closed loops with active devices, for which stability must be ensured at the frequency of operation and at other frequencies. As shown in Fig. 2.39(a) for the Dual-Output Class-E PA, the matching networks  $M_A$  and  $M_B$ , together with the device  $M_2$  form a closed loop which can give rise to oscillatory behavior if the loop gain satisfies Barkhausen criterion. This is unlike a cascode PA (Fig. 2.39(b)) where the common-gate device indeed helps to improve reverse isolation.

![](_page_95_Figure_1.jpeg)

Figure 2.40: Stability analysis for the Dual-Output Class-E PA. (a) PA without input stimulus, (b) simplified circuit for small-signal analysis, with the input device replaced by its output capacitance  $C_{out,1}$  and output resistance  $R_{out,1}$  and the top device modeled by its transconductance  $(g_m)$ , output capacitance  $C_{out,2}$  and output resistance  $R_{out,2}$  and (c) equivalent circuit for calculation of loop gain.

A small-signal analysis can be used to arrive at an expression for the gain of the loop resulting from internal power-combining in the Dual-Output PA. As shown in Fig. 2.40(a), the input source is removed and the bottom device is assumed to behave as a current source represented by its output capacitance  $C_{out,1}$  and output resistance  $R_{out,1}$ . The top device is modeled by its transconductance  $(g_m)$ , output capacitance  $C_{out,2}$  and output resistance  $R_{out,2}$  (Fig. 2.40(b)). The load resistance  $R_L$ has been ignored to determine the stability in the case of an open-circuit load, since the presence of a load generally improves the stability due to the loss it introduces. This results in the equivalent circuit shown in Fig. 2.40(c). For this closed-loop system, one can derive

Loop Gain = 
$$\frac{g_m Z_1 Z_3}{(Z_1 + Z_2)(1 + g_m Z_3) + Z_3}$$
 (2.47)

To ensure a stable design, the matching networks  $M_A$  and  $M_B$  should be chosen such that the following oscillation conditions are avoided: —Loop Gain— $\geq 1$  and  $\angle$ (Loop Gain)=0<sup>0</sup>. It should be noted that the resultant circuit is a Colpitts/Hartley-like oscillator, and startup is harder to meet when compared with a cross-coupled oscillator due to the voltage division involved in the feedback loop. In addition, the low available gain at mmWave frequencies and the biasing of the PA devices in weak inversion for Class-E operation (resulting in low  $g_m$ ) help ease the stability problem of the Dual-Output PA to a large extent. Nevertheless, choice of the matching networks is critical not only for constructive power-combining, but also to ensure unconditional stability. We do not observe any signs of instability in our prototypes, as shown later in Section 2.7.6. Nevertheless, if potential instability is noticed at lower frequencies where devices have higher gain, frequency-selective loss networks (such as graded capacitor-resistor pairs) can be employed at the gate bias lines and also at the drain terminals (serving as supply bypass) [52].

![](_page_96_Figure_2.jpeg)

#### 2.7.5 Millimeter-wave Dual-Output Class-E-like PA Implementation

Figure 2.41: (a) Dual-Output Class-E PA unit cell schematic (left) and impedance transformation networks used for internally power-combining the output power available from top and bottom devices (right). Impedance levels at pertinent nodes are annotated. (b) Drain-source voltage and current waveforms for top and bottom devices exhibiting non-overlapping characteristics confirming Class-E-like operation ( $V_{gate,bot}=0.6V, V_{gate,top}=1.8V, V_{DD,bot}=1.3V, V_{DD,top}=2.8V$ ).

This section explores the design of a Dual-Output Class-E PA unit cell and a power-combined PA employing two such unit cells. The unit cell PA, shown in Fig. 2.41(a) was designed for a saturated output power of  $\approx 15$ dBm (i.e.  $\approx 30$ mW). The load impedance was split equally between the load networks of the top and bottom devices in the PA implementations. Since the output amplitude is the same, both devices deliver equal output power ( $\approx 12$ dBm) to the individual 100 $\Omega$  loads. In order to account for soft-switching at mmWave and poor quality factor of passive components employed

in the impedance transformation networks, we assume a 3dB design margin in output power.

Utilizing the design methodology described in Sections 2.7.2 and 2.7.3, for each pairing of device sizes the tuning of the individual Class-E load networks was varied to arrive at a global optimum for PAE, while ensuring that the top and bottom devices deliver equal output powers to their respective load networks. Finally, all the device sizes and associated component values were scaled so that each device delivers  $\approx 15$ dBm to its load network. It was found that PAE is optimized when the bottom device is twice as large as the top device and both the load networks have the same tuning  $n = \frac{1}{\omega_0 \times \sqrt{L_{s,1}C_{out,1}}} = \frac{1}{\omega_0 \times \sqrt{L_{s,2}C_{out,2}}} = 1.412$ . This corresponds to real load impedances of 76 $\Omega$  and 27 $\Omega$ for the top and bottom devices respectively. Intuitively, we would expect the bottom device to be larger than the top device and drive a smaller load impedance to deliver the same power with half the voltage swing. Fig. 2.41(a) illustrates the networks used for the top and bottom devices which transform the respective optimal load impedances of 76 $\Omega$  and 27 $\Omega$  to 100 $\Omega$  for power-combining, while ensuring optimal phase and amplitude alignment at the final output node. In addition, the topology chosen for the impedance transformation networks can conveniently absorb the pad capacitance as well. The shunt transmission line used in the input matching network provides ESD protection without any performance penalty.

A second PA prototype was implemented by current-combining two Dual-Output unit cells (with larger device sizes, Fig. 2.42(a)) to further enhance the output power, approaching  $\approx$ 20dBm onchip. The impedances at pertinent nodes are marked on the circuit diagram, while the impedance transformation networks used for internal power-combining are illustrated in Fig. 2.42(b). Since the 50 $\Omega$  load is equally split between the two current-combined unit cells, the optimal load impedance for each device in the unit cell is transformed to 200 $\Omega$  for internal power-combining. The increase in load impedance along with device sizes (by a factor of almost two compared to the unit cell) results in an impedance transformation that is four times steeper. Consequently, one can expect higher losses in the matching networks and hence lower efficiency from the power-combined PA. Alternative techniques such as transformer-based power-combining can be exploited to boost the output power without sacrificing efficiency.

Following the approach described in Section 2.5.7, the simulated drain-source voltage and switch-current waveforms for the Dual-Output PA unit cell is shown in Figs. 2.41(b). The nonoverlapping nature of the voltage and current waveforms along with their high harmonic content

![](_page_98_Figure_1.jpeg)

Figure 2.42: (a) Current-combined Dual-Output Class-E PA schematic. (b) Impedance transformation networks used for internally power-combining the output power available from top and bottom devices. Impedance levels at pertinent nodes are annotated.

confirms switch-mode Class-E operation.

## 2.7.6 Experimental Results

![](_page_98_Figure_5.jpeg)

Figure 2.43: Chip microphotograph of (a) Dual-Output Stacked Class-E PA unit cell and (b) two-way current-combined Dual-Output Class-E PA.

The chip microphotographs of the two PAs are shown in Fig. 2.43. The Dual-Output Class-E PA unit cell and the power-combined PA occupy  $0.8 \text{mm} \times 0.6 \text{mm}$  and  $1.06 \text{mm} \times 0.6 \text{mm}$  of die area (without pads) respectively.

The layout and modeling of body-contacted power devices follows the strategy discussed in Section 2.5.6. Two power device test structures with dimensions  $\frac{1.5\mu m \times 100}{56nm}$  and  $\frac{3\mu m \times 100}{56nm}$  (used in the power-combined version of the Dual-Output Class-E PA fabricated in 45nm SOI CMOS) were measured to have peak  $f_{max}$  of 135GHz and 105GHz respectively [32]. Usage of the available 40nm floating-body devices and splitting the overall device into several smaller devices wired appropriately in parallel should improve the  $f_{max}$  and hence the gain available from the device [49] and the performance of our prototypes.

Fig. 2.44(a,b) and 2.44(c,d) illustrate the simulated and measured small signal S-parameters of the Dual-Output stacked Class-E PA unit cell and the two-way current-combined PA implemented in 45nm SOI CMOS. The measured peak gain of the Dual-Output unit cell PA is 9.8dB at 46GHz, with a -3dB bandwidth extending from 41GHz to 57GHz. The -1dB bandwidth extends from 43GHz to 51GHz, making it suitable for wideband applications. The measured peak gain of the powercombined PA is 8.2dB at 51GHz, with a -3dB bandwidth extending from 45GHz to 57GHz. The measured -1dB bandwidth spans 48GHz to 54GHz. As discussed previously in Section 2.5.8.1, the PAs possess small-signal gain (despite being designed for Class-E operation under large input drives) since at the DC bias point the devices are biased in weak inversion. The  $\mu$  stability factor for the prototypes, calculated using measured small-signal S-parameters, are depicted in Fig. 2.45(a). Since the  $\mu$  factor is always >1 throughout the measured frequency range, the PAs are unconditionally stable.

The large signal performances of the fabricated prototypes are shown in Fig. 2.45(b) and (c) respectively. The large signal performance of both the unit cell and the power-combined PA were measured at 47.5GHz, despite the fact that small signal gain of the latter peaks at  $\approx$ 50GHz (Fig. 2.44(d)). Large signal measurement beyond 47.5GHz was limited by the characteristics of the measurement equipment (specifically, a Quinstar PA used to drive the PAs under test). Measurement results yield a peak PAE of 25.5% for the Dual-Output PA unit cell with a saturated output power of 17.9dBm at 47.5GHz, and a peak PAE of 16% for the power-combined PA with a saturated output power of 19.1dBm at 47.5GHz. Excellent agreement is observed between mea-

![](_page_100_Figure_1.jpeg)

Figure 2.44: (a) and (b) Small signal S-parameters of Dual Output Class-E PA unit cell with  $V_{gate,bot}=0.52V, V_{gate,top}=1.6V, V_{DD,bot}=1.2V$  and  $V_{DD,top}=2.4V$ . (c) and (d) Small signal S-parameters of current-combined Dual-Output Class-E PA with  $V_{gate,bot}=0.5V, V_{gate,top}=1.7V, V_{DD,bot}=1.1V$  and  $V_{DD,top}=2.7V$ .

surement and simulation as a consequence of the active and passive device modeling efforts. The current-combined PA achieves lower efficiency at 47.5GHz when compared with the unit cell due to its steeper impedance transformations, larger power devices with lower  $f_{max}$ , and 50GHz center frequency. It is worth mentioning that even though supply voltages greater than 1.25V (which is the maximum recommended  $V_{DD}$  in this technology) have been used in the prototypes, the actual  $V_{ds}$  across the devices in DC is always  $\approx 1.1$ V owing to voltage drop across the interconnect resistances. Furthermore, under large signal operation, bias values and input power are chosen to ensure that maximum voltage difference across any pair of terminals never exceeds  $2V_{DD,max}=2\times1.25$ V for long-term reliable operation.

![](_page_101_Figure_1.jpeg)

Figure 2.45: (a) Small signal  $\mu$  stability factors of Dual-Output Class-E PA unit cell and the current-combined prototype, calculated using measured small signal S-parameters shown in Figs. 2.44(a), 2.44(b), 2.44(c) and 2.44(d) respectively. The stability factor is >1 over the measured frequency range, indicating unconditional stability. (b) Large signal performance of the Dual-Output stacked Class-E PA unit cell at 47.5GHz ( $V_{gate,bot}=0.6V, V_{gate,top}=1.8V, V_{DD,bot}=1.3V, V_{DD,top}=2.8V$ ). (c) Large signal performance of current-combined Dual-Output Class-E PA at 47.5GHz ( $V_{gate,bot}=0.5V, V_{gate,top}=1.9V, V_{DD,bot}=1.4V, V_{DD,top}=2.9V$ ).

# 2.8 Millimeter-wave Watt-class Stacked Class-E-like Switching PA Array in CMOS

As discussed previously in this chapter, recent works involving series stacking of multiple devices in PAs have demonstrated moderate output powers (around 17-20 dBm) with high efficiency (20-35%) in fine-line CMOS at mmWave frequencies. However, watt-level output power is yet to be achieved at these frequencies.

This section describes a pathway to realize watt-level output power on-chip at mmWave from a single CMOS IC. The use of a lumped quarter-wave combiner [14] that enables one-step, large-scale, low-loss, on-chip power combining at mmWave frequencies, in conjunction with stacked Q-band Class-E-like SOI CMOS PAs discussed in Section 2.5 results in watt-class operation ( $P_{sat} > 0.5$  W) from a 45 nm SOI CMOS PA array. Detailed discussions regarding the lumped quarter-wave combiner can be found in [14] and only the highlights of the pertinent design considerations and measurement results are presented here.

![](_page_102_Figure_1.jpeg)

#### 2.8.1 Proposed Non-isolating Lumped Quarter-wave Combiner

Figure 2.46: An *n*-way spiral-based lumped quarter-wave combiner with design equations.

The work in [14] proposes a quarter-wave combiner (shown in Fig. 2.46), which is essentially an n-way Wilkinson combiner without the isolation resistors. In order to enable 8-way combining in a single structure, a lumped  $\pi$ -section equivalent of a quarter-wave transmission-line is employed in each path [53], with the  $\pi$ -section realized as a single spiral inductor (Fig. 2.47). To achieve the desired  $Z_0 = 50\sqrt{n}$  ohms and quarter-wavelength, the spiral must achieve an inductance of  $L = 50\sqrt{n}/\omega_o$  (=500pH for n=8) and a parasitic capacitance of  $C = 1/(50\sqrt{n}\omega_o)$  (=25fF for n=8) on either side. Therefore, the number of elements that can be combined in a single step will be limited by the achievable self-resonant frequency (SRF) of spirals in the BEOL. We found that up to 12 elements may be combined based on achievable SRF, but pursued 8-way combining in this work due to floor-planning considerations. The key insight is that larger-scale one-step power combining is achieved compared to Wilkinson combining because spirals are able to achieve higher  $Z_0$  (greater inductance for a given parasitic capacitance) than transmission lines via their self magnetic coupling. Loss is also reduced as spirals are able to use wider line widths than thin high- $Z_0$  transmission lines. The combiner presents a 50\Omega load to each PA. It is implemented in the

top-most metal layer which has a thickness of  $2.225\mu m$  and is  $9.5\mu m$  above the substrate. Measured breakouts show a spiral Q of 25 and 8-way combining efficiency of 75% at 45GHz, and excellent agreement with EM simulations (78% at 43GHz). For comparison, 8-way-combining via a 3-layer cascade of 2:1 Wilkinsons has a simulated efficiency of only 63%. A comparison of the lumped quarter-wave combiner's performance with conventional power combiners (three-level cascade of 2:1 Wilkinsons and a zero-degree combiner) can be found in [14].

#### 2.8.2 Implementation Details

![](_page_103_Figure_3.jpeg)

Figure 2.47: Schematic of the 33-46 GHz watt-class PA array prototype.

In order to demonstrate the utility of the lumped quarter-wave combiner as a highly efficient, large-scale power-combiner, a 33-46 GHz watt-class PA array prototype (shown in Fig. 2.47) is fabricated in a 45nm SOI CMOS process. Eight stacked-FET PA unit-cells are combined using the eight-way lumped quarter-wave combiner. The input power is delivered by means of an eight-way input splitter whose details are discussed in [14]. The PA unit-cell used in the watt-class PA array prototype is based on a two-stage design, where the driver is a two-stacked Class-E-like PA while the output stage is a four-stacked class-E-like PA. A breakout of the two-stage PA unit cell was discussed in Section 2.5.7.

![](_page_104_Picture_1.jpeg)

## 2.8.3 Measurement Results

Figure 2.48: Chip microphotograph of the 33-46 GHz watt-class PA array prototype. Chip dimensions are 3.2mm×1.3mm without pads.

The 3.2 mm  $\times 1.3$  mm watt-class PA array prototype (Fig. 2.48) is probed in a chip-on-board configuration. The simulated and measured small-signal S-parameters are shown in Fig. 2.49(a). A peak  $S_{21}$  of 19 dB is measured in small-signal at 50 GHz. The large signal performance vs output power at three frequencies is summarized in Fig. 2.49(c) while the measured efficiency and saturated output power across frequency are shown in Fig. 2.49(b). The PA achieves a peak  $P_{sat}$  of 27.2dBm at 35GHz with a peak PAE of 10.7%. The prototype maintains 1 dB-flatness in saturated output power (26-27 dBm) from 33-46 GHz while the measured PAE varies between 8.8% to 10.7%in this range. It is interesting to note that the PA array achieves wider-bandwidth performance than the PA unit-cells themselves (when they are loaded with a 50  $\Omega$  load impedance [3]) since the eight-way lumped quarter-wave combiner's input impedance tracks the optimal load impedance required by the PA unit-cells over nearly the entire Q-band. Measurement below 33 GHz is limited by the experimental setup, but it is expected that the PA's bandwidth extends significantly below Q-band. A preliminary probed RF stress test is performed where the watt-class PA array is operated at the  $P_{sat}$  drive level for approximately 12 hours at 44 GHz. The observed variations in output power, drain efficiency and PAE are small (<0.5 dB, 1.4% and 1.4% respectively, Fig. 2.49(d)). Drift in the setup (Quinstar driver PA, power sensors etc.), minor probe movements and DUT self heating due to imperfect conduction of heat away from the IC may be contributing factors. The main purpose is to show the benefits of device stacking in distributing the voltage swing among the stacked devices and the absence of immediate breakdown effects despite the high supply voltages used.

![](_page_105_Figure_2.jpeg)

Figure 2.49: (a) Simulated and measured small-signal S-parameters of the watt-class powercombined PA array. Measured results showing (b) large-signal saturated output power, peak PAE and drain efficiency at peak PAE across frequency. (c) Gain, PAE and drain efficiency across output power levels for three frequencies. (d) Results of a preliminary probed stress test performed on the PA for 12 hours.

# 2.9 Comparison with State of the Art

Table 2.4 depicts a comparison of the implemented stacked mmWave Class-E-like PAs to contemporary state-of-the-art mmWave CMOS and SiGe PAs. The 65 nm PA is comparable to state-of-the-

| Reference | Technology          | Power Device     | Freq. | $V_{DD}$     | P <sub>sat</sub> | η     | Peak PAE | Gain | ITRS        | FoM1 <sup>6</sup> | Class of             | Power                                   | Fully            |
|-----------|---------------------|------------------|-------|--------------|------------------|-------|----------|------|-------------|-------------------|----------------------|-----------------------------------------|------------------|
|           |                     | f <sub>max</sub> | (GHz) | (V)          | (dBm)            | (%)   | (%)      | (dB) | FoM *       | 1                 | Operation            | Combining                               | Integrated?      |
| This      | 45nm SOI            | 190 GHz          | 47    | 2.4          | 17.6             | 42.4  | 34.6     | 13   | 59.43       | 13.85             | Class E,             | None                                    | Yes              |
| work      | CMOS                | (Measured)       |       |              |                  |       |          |      |             |                   | 2-stacked            |                                         |                  |
| This      | 45nm SOI            | 180 GHz          | 47.5  | 4.8          | 20.3             | 23    | 19.4     | 12.8 | 59.04       | 13.46             | Class E.             | None                                    | Yes              |
| work      | CMOS                | (Measured)       |       |              |                  |       |          |      |             |                   | 4-stacked            |                                         |                  |
| This      | 45nm SOI            | 190 GHz          | 47    | 2.4 (driver) | 20.1             | 15.6  | 15.4     | 24.9 | 70.32       | 24.74             | Class E. 4-stacked   | None                                    | Yes              |
| work      | CMOS                | (Measured)       |       | 4.8 (PA)     |                  |       |          |      |             |                   | Two-stage cascade    |                                         |                  |
| This      | 65nm                | 180 CHz          | 47.5  | 2.8          | 18.2             | 35.8  | 28.3     | 11.9 | 57.45       | 12.34             | Class E              | Diff with diff output <sup>8</sup>      | Vos              |
| 1 ms      | CMOS                | (Massured)       | 41.0  | 2.0          | 10.2             | 30.0  | 20.0     | 11.2 | 01.40       | 12.04             | Class E,             | Din., with din. output                  | 105              |
| work      | CMOS                | (Measured)       |       |              | 1.0              |       |          |      |             | 10.7              | 2-stacked            |                                         |                  |
| This      | 45nm SOI            | 138 GHz          | 47.5  | 2.8          | 17.9             | 33.8  | 25.5     | 9.8  | 55.3        | 12.5              | Dual-Output Class-E, | None                                    | Yes              |
| work      | CMOS                | (Simulated)      |       |              |                  |       |          |      |             |                   | 2-stacked            |                                         |                  |
| This      | 45nm SOI            | 130 GHz          | 47.5  | 2.9          | 19.1             | 24.5  | 16       | 8.2  | 52.88       | 10.6              | Dual-Output Class-E, | Two-way current-combined                | Yes              |
| work      | CMOS                | (Measured)       |       |              |                  |       |          |      |             |                   | 2-stacked            |                                         |                  |
| This      | 45nm SOI            | 190 GHz          | 35.1  | 2.4 (driver) | 27.2             | 11.7  | 10.7     | 19.4 | 67.8        | 22.22             | Class E, 4-stacked   | 8-way Lumped-element                    | Yes              |
| work      | CMOS                | (Measured)       |       | 4.8 (PA)     |                  |       |          |      |             |                   | Two-stage cascade    | $\lambda/4$ Power-combiner              |                  |
| [4]       | 45nm SOI            | 190 GHz          | 45    | 2.7          | 18.6             | N/A   | 34       | 9.5  | 56.48       | 10.9              | Class AB,            | None                                    | Yes              |
|           | CMOS                | (Simulated)      |       |              |                  |       |          |      |             |                   | 2-stacked            |                                         |                  |
| [54]      | 32nm SOI            | N/A              | 60    | 0.9          | 12.5             | N/A   | 30       | 10   | 52.83       | N/A               | Class-E              | Diff. with diff. output <sup>8</sup>    | Yes              |
|           | CMOS                |                  |       |              |                  |       |          |      |             |                   |                      |                                         |                  |
| [55]      | 40nm                | N/A              | 60    | 1            | 17.4             | 35.9  | 29.3     | 21.2 | 68.83       | N/A               | Class AB             | 2-way diff. transformer combined        | Yes              |
|           | CMOS                |                  |       |              |                  |       |          |      |             |                   |                      | -                                       |                  |
| [56]      | 45nm SOI            | N/A              | 45    | 5.5          | 28               | N/A   | 14       | N/R  | N/R         | N/R               | Class AB.            | 4-way diff.                             | No <sup>++</sup> |
| [0.0]     | CMOS                |                  |       |              |                  |       |          |      |             |                   | 4-stacked            | spatial power comb                      |                  |
| [57]      | 40nm                | N/A              | 60    | 1            | 15.6             | N/A   | - 25     | N/A  | N / A       | N/A               | N/A                  | 2-way diff transformer-combined         | Voe              |
| [01]      | CMOS                |                  | 00    | 1            | 10.0             | 1.1.1 | 20       | 1.1  | 14/11       | 1,11              | 14/11                | (                                       | 103              |
| [ ( m 1   | CMOS                | 27.1             |       | 4.0          |                  | 27.14 |          | 4.0  | 00.04       | 27/1              | (T 1D                | (outphasing)                            |                  |
| [45]      | 65nm SOI            | N/A              | 60    | 1.8          | 14.5             | N/A   | 25       | 16   | 60.04       | N/A               | Class AB,            | None                                    | Yes              |
|           | CMOS                |                  |       |              |                  |       |          |      |             |                   | cascode              |                                         |                  |
| [30]      | 45nm SOI            | 240 GHz          | 45    | 4            | 18.2             | N/A   | 23       | 8    | 52.88       | 5.27              | Class AB,            | None                                    | Yes              |
|           | CMOS                | (Simulated)      |       |              |                  |       |          |      |             |                   | 3-stacked            |                                         |                  |
| [58]      | 65nm                | N/A              | 79    | 1            | 19.3             | N/A   | 19.2     | 24.2 | 74.29       | N/A               | N/A                  | 8-way transformer and t-line combiner   | Yes              |
|           | CMOS                |                  |       |              |                  |       |          |      |             |                   |                      |                                         |                  |
| [59]      | 65nm                | N/A              | 60    | 1            | 18.6             | N/A   | 15.1     | 20.3 | 66.25       | N/A               | N/A                  | 4-way transformer combined              | Yes              |
|           | CMOS                |                  |       |              |                  |       |          |      |             |                   |                      |                                         |                  |
| [5]       | 45nm SOI            | 250  GHz         | 45    | 5.1          | 24.3             | 21.3  | 14.6     | >18  | 67          | 19.04             | Class B/AB,          | Diff. with diff. output <sup>8</sup>    | $No^+$           |
|           | CMOS                | (Measured)       |       |              |                  |       |          |      |             |                   | 4-stacked            |                                         |                  |
| [60]      | 90nm                | N/A              | 60    | 1.2          | 19.9             | N/A   | 14.2     | 20.6 | 67.59       | N/A               | N/A                  | 4-way Wilkinson-tree combiner           | Yes              |
|           | CMOS                |                  |       |              |                  |       |          |      |             |                   |                      |                                         |                  |
| [16]      | 65nm                | N/A              | 60    | 1            | 17.9             | N/A   | 11.7     | 19.2 | 63.34       | N/A               | Class A              | 4-way transformer and current combiner, | Yes              |
|           | CMOS                |                  |       |              |                  |       |          |      |             |                   |                      | diff. with diff. output <sup>*</sup>    |                  |
| [61]      | 0.13µm              | 240              | 58    | 1.2          | 11.7             | N/A   | 20.9     | 4.2  | 44 37       | -3.23             | Class E              | None                                    | Ves              |
| [01]      | SiGe BiCMOS         | (Measured)       | 00    |              |                  |       | 20.0     |      | 1.01        | 0.20              | 010012               | 1010                                    | 100              |
| [63]      | 0.12um              | 240 CHr          | 40    | 2.4          | 10.4             | NI/A  | 14.4     | 6    | 40.45       | 1.94              | Close F              | 2 way Willingon combiner                | Voc              |
| [02]      | SiC- BiCMOS         | (Massured)       | 42    | 2.4          | 15.4             | N/A   | 14.4     |      | 43.40       | 1.04              | Class E              | 2-way winkinson combiner                | 165              |
| faal      | SIGE BICMOS         | (Measured)       |       | 2.4          |                  | 27.14 |          |      | <b>FO F</b> |                   | <i>a</i> . P         |                                         |                  |
| [03]      | 0.13µm              | 240 GHz          | 40    | 2.4          | 14.75            | N/A   | 30.8     | 1.8  | 90.9        | 2.9               | Class B              | ivone                                   | res              |
|           | SiGe BiCMOS         | (Measured)       |       |              |                  |       |          |      |             |                   |                      |                                         |                  |
| [64]      | $0.13 \ \mu m SiGe$ | N/A              | 41    | 4            | 23.6             | N/A   | 31       | 12.5 | 63.3        | N/A               | Class E,             | None                                    | Yes              |
|           |                     |                  |       |              |                  |       |          |      |             |                   | 2-stacked            |                                         |                  |
| [65]      | 0.13 μm             | N/A              | 45    | 2.5          | 21.7             | 25    | 22       | 9.3  | 57.5        | N/A               | Class E              | 2-way Wilkinson combiner                | Yes              |
|           | SiGe BiCMOS         |                  |       |              |                  |       |          |      |             |                   |                      |                                         |                  |
| [17]      | $0.13 \mu m$        | N/A              | 42    | 2.4 (driver) | 28.4             | N/A   | 28.4     | 18.5 | 73.9        | N/A               | Class E              | 16-way power combined                   | Yes              |
|           | SiGe BiCMOS         |                  |       | 4 (PA)       |                  |       |          |      |             |                   |                      |                                         |                  |
| [66]      | $0.25 \mu m$ GaAs   | N/A              | 40    | 6            | 27               | N/A   | 26.6     | 16   | 69.29       | N/A               | N/A                  | None                                    | No               |
|           | pHEMT               |                  |       |              |                  |       |          |      |             |                   |                      |                                         |                  |
| [67]      | 0.12µm GaAs         | N/A              | 43.5  | 6            | 35.4             | N/A   | 21       | 23   | 84.4        | N/A               | N/A                  | 16-way power combined                   | Yes              |
|           | DHEMT               |                  |       |              |                  |       |          |      |             |                   |                      |                                         |                  |

Table 2.4: Comparison of Fabricated PAs with State-of-the-art CMOS & SiGe mmWave PAs

\*\* Defined as  $P_{sat}(\mathrm{dBm})+~\mathrm{Gain}(\mathrm{dB})~+~20\mathrm{log}_{10}(\mathrm{Freq.(GHz)})~+~10\mathrm{log}_{10}(\mathrm{PAE}).$ 

<sup>a</sup> Defined as  $P_{sat}(dBm) + Gain(dB) + 20log_{10}(Freq./f_{max}) + 10log_{10}(PAE)$ . The measured/simulated  $f_{max}$  of power devices is used for the calculations.

<sup>8</sup> Ideal external lossless output balun assumed.

 $^{\rm +}$  Uses off-chip bias-T for providing power supply.

\*\*\* Uses bondwire inductor in output matching network. ++ Uses an on-PCB input matching network.

art implementations in efficiency, despite the poor ON-resistance characteristics of the technology. This is a direct consequence of the loss-aware Class-E design methodology. On the other hand, the 2-stacked PA in 45 nm SOI CMOS exhibits the highest PAE reported for a CMOS mmWave PA. The PA reported in [31] exhibits similar PAE and output power, and also employs device stacking in 45 nm SOI CMOS, but in the context of Class AB operation. The 4-stacked PA in 45 nm SOI CMOS exhibits the highest output power achieved from a fully-integrated CMOS mmWave PA. The work in [5] uses an off-chip bias-T to provide the supply voltage, and consequently does not integrate the mmWave dc-feed inductor. Furthermore, it is a differential implementation with a differential output, and assumes ideal 3 dB external differential-to-single-ended conversion. It is important to study the output power delivered to a single-ended output pad when comparing works. An on-chip dc-feed inductor is seen in simulation to introduce approximately 1 dB output-side loss based on the quality-factor achievable in this technology. When these are factored in, the work in [5] achieves comparable output power to our 4-stacked PA, with an associated PAE that is lower than our 4-stack and comparable to our cascade PA. Other prior fully-integrated CMOS mmWave PAs with comparable output power [58], [60], [32] rely on power combining. Since most of the works reported in Table 2.4 operate at higher frequencies, it is important to use a figure-of-merit (FOM) to ensure fair comparison. The ITRS FoM, defined as

$$ITRS \ FoM = P_{sat}(dBm) + Gain(dB) + 10log_{10}PAE + 20log_{10}f_0$$
(2.48)

where  $f_0$  is the operating frequency in GHz, takes into account four important performance metrics of a PA. In order to incorporate technology limitations, the maximum oscillation frequency  $f_{max}$ of the technology can be included as part of a modified FoM given by [68]:

$$FoM_1 = P_{sat}(dBm) + Gain(dB) + 10log_{10}PAE + 20log_{10}(f_0/f_{max})$$
(2.49)

The implemented single-stage prototypes achieve ITRS FOM and  $FoM_1$  comparable to current state-of-the-art mmWave CMOS PAs and the highest amongst fully-integrated PAs which do not employ power-combining. In particular, the two-stage cascade PA in 45 nm SOI CMOS achieves the highest ITRS FOM amongst PAs which do not employ power-combining, and second highest overall.

The Dual Output PAs achieve competitive performance in both ITRS FOM and FoM<sub>1</sub>, which points to the efficacy of the Multi-output Class-E design methodology given the relatively low  $f_{max}$
of the power devices in our prototypes (as a consequence of the usage of 56-nm body-contacted devices and a continuous array of gate fingers). The use of the 40nm floating-body devices along with a better multiplicity-based device layout is thus expected to improve absolute performance. Design of minimum-loss matching networks that optimally distribute the output power at the various intermediary nodes, along with potential applications of the multiple outputs for power distribution in an integrated application constitute interesting topics for future investigation.

The eight-way power-combined PA achieves an output power of 27.2 dBm which is approximately 5 dB (3×) higher than any other CMOS mmWave PA and a very high ITRS figure-of-merit. The implemented PA array also achieves the highest fractional bandwidth (33%). When compared with the state-of-the-art SiGe mmWave PA [17], we see that aggressive device stacking in conjunction with power-combining has enabled comparable performance despite the higher supply voltage of 0.13  $\mu$ m SiGe. GaAs mmWave PAs [67] achieve higher output power but device stacking and large-scale lumped-element power-combining have narrowed the gap.

### 2.10 Conclusion

The discussions and experimental results presented in this chapter indicate that aggressive stacking in switch-like mmWave CMOS PAs in conjunction with low-loss, large-scale, lumped-element powercombining enable the realization of efficient PAs in CMOS with watt-level output power at mmWave frequencies for the first time. A state-of-the-art ultra-wideband watt-level CMOS PA at mmWave frequencies has been demonstrated to verify the claims. This is a step towards enabling large-scale deployment of low-cost, long-distance CMOS Q-band communication links.

## Chapter 3

# Stacked Millimeter-wave Power DACs

The previous chapter outlined a pathway to realize energy-efficient watt-class mmWave CMOS PAs: aggressive stacking in non-linear "switch-like" mmWave PAs to implement unit cell PAs with high efficiency and moderate output power ( $\approx 20$ dBm) followed by large-scale power-combining of several such unit cells using a low-loss combiner to achieve watt-level output power on-chip. The focus of this chapter is to discuss the challenges associated with realizing transmitters that can efficiently support complex digital modulations and propose architectural solutions to address these issues.

A discussion of the quintessential efficiency vs. linearity trade-off and efficiency under back-off challenges in PAs is presented in Section 3.1 followed by an introduction to the concept of an RF power DAC-based approach for realizing transmitters with high back-off efficiency in Section 3.2. Prior works on mmWave power DACs is summarized in Section 3.3. Section 3.4 details a mmWave power DAC-based PA architecture (first introduced in [69] and expanded in [14]), that simultaneously enables high saturated output power ( $P_{sat}$ ) through large-scale power combining, digital amplitude modulation and high efficiency under back-off through supply-switching and linearity through dynamic load modulation.

Section 3.4.2 describes the architectural overview, design considerations as well as implementation details and measurement results for high-power 1-bit mmWave DAC unit cells [33] used in [14]. The implementation and measurement results of the overall 3-bit mmWave power DAC ([14]) employing the unit cells are discussed in Section 3.4.3.

Section 3.5 describes supply modulation for stacked mmWave Class-E-like PAs as a pathway to realize high-resolution power DACs while retaining high output power as well as high peak and average efficiencies. A novel supply-adaptive biasing scheme is proposed for stacked Class-Elike power DACs to retain the benefits of stacking (equal voltage sharing) and Class-E switching characteristics under supply modulation. Section 3.5.3 describes switched-capacitor supply modulated stacked mmWave power DACs while a Class-G supply modulator-based "hybrid" mmWave 4-stacked power DAC is presented in Section 3.5.4 for a convenient practical implementation of high-resolution mmWave power DACs with high peak and back-off efficiencies.

## 3.1 Linearity vs Efficiency Trade-off and Efficiency Under Back-off Challenges for Conventional PAs



Figure 3.1: (a) Linearity vs. efficiency trade-off in conventional PAs and (b) efficiency under back-off profile of conventional transmitters when supporting complex modulations.

As discussed in Chapter 2 there has been significant progress of late towards the implementation of high-power, high-efficiency mmWave PAs in scaled CMOS through techniques such as switchmode operation and series stacking of devices [3–5] as well as power combining [14, 19, 56, 58]. Furthermore, a 0.5W PA was demonstrated in the Q-band using a single-step, 8-way, low-loss lumped quarter-wave power combiner and stacked Class-E-like PA unit cells [14].

A second major challenge in the realization of energy-efficient transmitters arises from the trade-

off between efficiency and linearity in conventional PAs. Quasi-linear PA classes (like class-A, class-AB, class-B) are typically less efficient than their nonlinear counterparts (like class-E, class-D<sup>-1</sup>, etc.), as shown in Fig. 3.1(a). Furthermore, it is not sufficient to achieve high efficiency operation only at  $P_{sat}$ . To efficiently utilize the spectrum and achieve high data-rates, PAs are operated in highly backed-off regions to handle high-order modulations in a linear manner. Furthermore, the high peak-to-average power ratios (PAPR) of high-order modulations implies that the PA would spend an even larger fraction of the time operating at a backed-off power level. This causes the average transmitter efficiency to plummet since the quintessential PA is most efficient near  $P_{sat}$  and exhibits poor efficiency under back-off (Fig. 3.1(b)). PAs employing architectures such as outphasing [70,71] and Doherty [72] are extensively used at RF frequencies to achieve high efficiency under backoff. However, the load modulation effect in outphasing is quite weak at mmWave frequencies and does not provide significant benefits in efficiency under back-off [57]. The Doherty architecture offers considerable improvement in efficiency under back-off but requires extensive linearization, which can be challenging at the high data rates typically employed at mmWave frequencies [73].

### 3.2 RF DAC: A New Design Paradigm in Fine-line CMOS

The incredible advancements in mainstream CMOS technology suggest a paradigm shift in the design philosophy of RF circuits, based on a fundamental observation:

"In a deep-submicron CMOS process, time-domain resolution of a digital signal edge transition is superior to voltage resolution of analog signals" [74].

In other words, encoding information in the edge transitions of signals (time-domain) is preferable to encoding in the amplitude levels (voltage domain) as in a conventional analog design approach. This suggests an analog/RF design strategy exploiting fast switching characteristics of MOS transistors with fine control of timing transitions, while minimizing the reliance on voltage resolution which is adversely affected by a continuous reduction in supply voltage and increasing noise and interferer levels. Additionally, such "digital-intensive" architectures would be highly reconfigurable thereby facilitating in-field performance optimization and reconfigurability across bands and standards. An additional benefit would be portability from one process node to the next with minimal modifications. The aforementioned benefits have impelled research efforts towards the realization of digitally intensive/all-digital RF transmitters that simultaneously achieve high output power, linearity (potentially through digital pre-distortion) and efficiency under back-off while being highly reconfigurable and portable [75–79]. The use of an RF Power Digital-to-Amplitude Converter (DAC) or RF Power DAC provides a pathway to realize such "digital" transmitters and the next section discusses the extension of the principles to realize the same at mmWave frequencies.

#### 3.3 Prior Work on mmWave Power DACs



Figure 3.2: Direct modulator topologies for mmWave power DACs [2]: (a) Double-balanced baseband-DAC topology and (b) double-balanced RF-DAC topology.

There have been some recent works on the implementation of high-power mmWave DACs [2,80,81] that directly translate the design approaches for DACs implemented at RF frequencies to mmWave. The work in [2] provides an overview of possible RF-DAC or direct-modulator topologies. The technique builds on the idea of using Gilbert-cell upconversion mixers (that commute an RF/mmWave signal in the Gilbert-cell mixing pair/ mixing quad) and segmenting either the transconductor or the mixing quad devices with independent control signals to implement a modulator with multi-bit amplitude resolution. A key difference is that unlike a conventional mixer, both the transconductor and the mixing quad operate in switching mode in a modulator. Despite the higher energy-efficiency of a single-balanced topology, the strong data-dependent output impedance variations result in significant AMAM and AMPM distortion. Consequently, a double-balanced modulator topology is preferred for implementing direct modulators [2]. As illustrated in

Fig. 3.2(a) and (b), two double-balanced modulator configurations are possible [2]: 1) baseband-DAC, where the data are first applied at the gates of the bottom differential pair and are then up-converted by the LO (mmWave) signal driving the gates of the Gilbert-cell mixing quad [80] and 2) envelope-DAC or RF-DAC (Fig. 3.2(b)), where the LO signal is applied at the gates of the differential transconductor pair and its envelope is modulated by the data applied at the gates of the mixing quad [82].



Figure 3.3: (a) Schematic of the mmWave 9-bit Power DAC presented in [2]. Simulated (b) output power and output amplitude vs digital amplitude word and (c) drain efficiency and PAE vs. output power of the 9-bit DAC at 90GHz in IBM 45nm SOI CMOS.

Although in both cases the modulated carrier (mmWave) signal appears at the drain terminals of the mixing quad, there are topology-specific nuances that must be considered for mmWave purposes. As discussed in [2], the baseband-DAC arrangement allows the data inputs to be driven with low-voltage standard CMOS logic at the expense of higher capacitive loading for the mmWave (LO) and modulated output signals, since each phase of the LO and modulated output is loaded by two devices. Moreover, the LO signal experiences the power gain of a single device, which can be small at mmWave frequencies. The RF-DAC topology on the other hand facilitates higher gain for the LO signal since it is amplified by a cascode stage and the gain can be further increased by series device stacking [3], [4]. Additionally, the capacitive loading on the LO path is lower. However, the high dc bias voltage at the gates of the cascode data devices necessitates CMOS drivers realized with thick-oxide MOSFETs.

Fig. 3.3(a) shows the schematic of the stacked Gilbert-cell power-DAC proposed in [2]. Since the large signal performance metrics as a function of the amplitude control word were not reported in [2], the schematic in Fig. 3.3(a) was simulated in IBM 45nm SOI CMOS at 90GHz using the circuit parameters reported in the work (without layout parasitics). The close correspondence between the simulated peak output power, drain efficiency and PAE (Fig. 3.3(b) and (c)) with those reported in [2] indicate that these simulations can be used as a reasonable representative of the characteristics of the topology. As is evident from Fig. 3.3(c), despite incorporating multi-bit resolution into the stacked mmWave power DAC, the technique suffers from poor back-off efficiency  $\left(\frac{\eta_{-6dB}}{\eta_{max}}=23\%\right)$ and  $\frac{PAE_{-6dB}}{PAE_{max}}$  = 13.5%). This is expected, since the RF-DAC topology maintains a constant dc power consumption irrespective of the amplitude control word, resulting in a linear degradation in efficiency as output power is backed-off. The small signal transfer characteristics reported in [2] suggest reasonable linearity with digital code. The corresponding large signal simulations indicate a compressive output characteristic with amplitude word (if the DAC is viewed to have 7-bit resolution considering amplitude word 128 to 256 or 0 to 128) which would necessitate Digital Pre-Distortion (DPD) [83] for linearization. Alternatively, we can view it as a 5-bit DAC with a linear output profile (control word 128 to 160 or 96 to 128).

In contrast, the work in [80] uses the baseband-DAC topology to realize an 8-bit I/Q power DAC, where the dc power consumption scales with the amplitude control word. This facilitates a more graceful Class-B-like back-off in efficiency ( $\frac{\eta_{-6dB}}{\eta_{max}}$ =50% and  $\frac{PAE_{-6dB}}{PAE_{max}}$ =59%). However, this

improvement in back-off efficiency comes at the expense of increased AM-AM and AM-PM nonlinearity (compared to the RF-DAC configuration) owing to the data-dependent output impedance variations. It should be noted that a double-balanced topology for the baseband-DAC topology does not reduce these non-linearities. The aforementioned results from contemporary power DAC implementations suggest that we need more efficient schemes for realizing multi-bit mmWave power DACs without sacrificing peak and average efficiencies as well as linearity.

## 3.4 mmWave Power DAC Based on Supply Switching, Power Combining and Load Modulation





Figure 3.4: Digitally-controlled load modulated power DAC architecture.

Recently, high-power high-efficiency 1-bit mmWave DAC cells have been proposed [5] which either operate in saturated output power mode or are turned off to maximize average efficiency. Several such mmWave power DAC unit cells can be combined in a large-scale, on-chip/free-space power-combining architecture (Fig. 3.4) to realize a power DAC with multi-bit resolution and high efficiency under back-off. Depending on the nature of the combiner (isolating versus nonisolating), load modulation effects may be observed which can be exploited for benefit to linearize the DAC [14].

A digitally controlled, supply-switched and load modulated switching PA architecture based on Fig. 3.4 was proposed in  $[14]^1$ . It is the first linearizing PA architecture at mmWave frequencies that simultaneously employs large-scale power combining to enable high output power, PA supply switching and load modulation for high efficiency under back-off, and load-modulation for linearization of switching PAs. The architecture employs several (n) switching-class mmWave PA unit-cells which can be individually turned ON or OFF by means of a digital control bit. These are combined using a non-isolating power combiner to make an overall linear mmWave DAC with high back-off efficiency through the load modulation of the combiner and the elimination of DC power consumption in OFF PAs. Details regarding the implementation of highly linear direct digital-to-mmWave DACs using this architecture can be found in [14] and only design considerations for the realization of the 1-bit mmWave power DAC cells used in the work are presented in this chapter.



Figure 3.5: Schematic and chip microphotograph of the implemented 1-bit mmWave 47GHz Class-E-like SOI CMOS power DAC.

<sup>&</sup>lt;sup>1</sup>In collaboration with Ritesh Bhat, Columbia University.

#### 3.4.2 Millimeter-wave 1-bit Stacked Power DAC

Fig. 3.5 depicts the architecture of the proposed 1-bit mmWave CMOS power DAC used as a unit cell in [14]. The mmWave path (Fig. 3.6(a)) in this mixed-signal architecture consists of a driver stage followed by the main amplifier, both of which are Class-E-like PAs with 2 devices stacked (as described in [3] and Chapter 2). Both stages are designed using the Class-E-like mmWave design



Figure 3.6: Schematics of the proposed 1-bit mmWave power DAC with (a) mmWave path and (b) digital path highlighted.

principles described in [3]. The PA unit cell is augmented with digitally controlled switches at strategic locations in order to facilitate high-speed 1-bit ASK (OOK) modulation (Fig. 3.6(b)): (a) pMOS supply-switch  $M_8$  for the main PA (b) input-match switch  $M_1$  for the driver (c) gatebias switches  $M_2$  and  $M_5$  and (d) bias-control switches  $M_9$  and  $M_{10}$  for the driver and main PAs respectively. When the PA is ON (Fig. 3.7),  $M_8$  is turned ON, allowing the PA to draw its dc current. Simultaneously,  $M_2$  and  $M_5$  are turned OFF while  $M_9$  and  $M_{10}$  are turned ON, so that the driver and the main PA receive their nominal operating gate biases.  $M_1$  is turned OFF as well, since the input impedance of the driver is matched to 50 $\Omega$ .



Figure 3.7: Simplified schematic illustrating ON state operation of the proposed 1-bit mmWave power DAC.

When the PA is turned OFF (Fig. 3.8),  $M_8$  is turned OFF ensuring that there is no wasteful dc current drawn by the main PA. A different strategy is adopted for de-activating the driver PA.  $M_2$ shorts the gate of the driver's input device ( $M_3$ ) to ground ensuring that an OFF driver draws no dc current. The combination of these two techniques helps conserve power.  $M_1$  is also turned ON, which, in conjunction with the series  $44\Omega$  resistor preserves input-match for the OFF driver. The DAC cell is utilized in a large-scale power-combining architecture (Figs. 3.4, 3.22) which requires



Figure 3.8: Simplified schematic illustrating OFF state operation of the proposed 1-bit mmWave power DAC.

that an OFF PA present a short-circuit impedance to the combiner in order to facilitate load modulation under back-off, resulting in a linear output amplitude with number of PAs ON [14]. This is accomplished by means of gate-bias switch  $M_5$  for the main PA, which applies a high gate bias to  $M_6$  when the PA is in the OFF state. The switches  $M_9$  and  $M_{10}$  are turned OFF as well. A single control bit  $b_n$  is used for all the switches, with appropriate logic inversion. As shown in Fig. 3.6(b), the control bit is fed to two separate inverter chains: one driving the supply switch and the other the input-match and gate-bias switches. The inverters in each path (in particular, the chain driving  $M_8$ ) are sized up progressively to drive their respective load capacitances.

In 45nm SOI technology, the DC  $V_{DS,max}$  is 1.2V, and the peak RF swing across any two device junctions must be kept below  $2 \times V_{DS,max}$  for the 40nm floating-body (FB) devices for long term reliable operation. A thick-oxide pMOS device is used as the supply switch with a DC  $V_{DS,max}$  of 2.4V. Therefore, at most two FB devices can be stacked in the main PA to prevent breakdown of the pMOS supply switch in both ON and OFF states. To increase output power, the main PA is designed for an optimal load of 25 $\Omega$ , which is transformed to 50 $\Omega$  using transmission lines.

#### 3.4.2.1 Design Considerations and Trade-offs



Figure 3.9: Considerations for bias-path RC time-constant to support Gbps modulation in the 1-bit Power DAC cell.

This section presents a detailed discussion of factors affecting modulation speed, dynamic power dissipation, impact of digital path delays and supply/ground bounce.

**3.4.2.1.1** Modulation Speed: The modulation speed of the DAC is essentially limited by the bias-path RC time constants associated with nodes whose bias voltages are changed during turn ON/OFF. In particular, bias resistors connected to the gates of  $M_3$  and  $M_6$  present an important trade-off: a large bias resistor will have less impact on mmWave static performance but will slow down the rise/fall time of the PA when it is turned ON/OFF (Fig. 3.9). Based on the total capaci-

tance at the gate nodes, bias resistors of  $1K\Omega$  for the driver and  $500\Omega$  for the main PA were found to be optimal for supporting Gbps modulation. Startup circuits can be used to accelerate the charging and discharging process and resolve the trade-off between speed and mmWave performance.



Figure 3.10: Incorporation of large decoupling capacitors into input and output impedance transformation networks to minimize settling time of DAC cell.



Figure 3.11: Impact of bias-path RC time constant on settling time of 1-bit Power DAC cell for (a) 500 $\Omega$  and (b) 1.5K $\Omega$  biasing resistor for input device M<sub>6</sub> of output stage (simulated using a 100MHz 50% duty-cycle clock).

Referring to the power-combined DAC architecture in Fig. 3.4, it is evident that decoupling capacitors are necessary in order to isolate the gate and drain biases of ON and OFF PAs. Consequently, series decoupling capacitors were incorporated into the input and output impedance transformation networks (Fig. 3.10). The value of the series decoupling capacitor needs to be carefully chosen: a large series decoupling capacitor will increase the rise/fall time under modulation due to increase in bias-path RC time constants but will have less impact on mmWave static performance due to its lower series loss.

Fig. 3.11 underscores the importance of RC time constant on output settling time. In Fig. 3.11(a) a 140fF input decoupling capacitor and optimal biasing resistor are used for the main PA, while Fig. 3.11(b) illustrates the situation where a 100pF input decoupling capacitor is used along with a large biasing resistor. In the latter case, the output voltage fails to settle to its steady-state value, which is slightly below 2V.



Figure 3.12: Average DC power consumption and average drain efficiency (DE) of the supplyswitched DAC cell as a function of total supply bypass capacitance in the main PA ( $C_{bypass,1}$ ) for a 500MHz clock input with 50% duty-cycle.

**3.4.2.1.2 Dynamic Power Dissipation:** The charging and discharging of circuit nodes under modulation is also associated with capacitive discharge loss (Fig. 3.12(a)), given by  $P_{loss,cap} =$ 

 $kf_0C(\Delta V)^2$ , where k is the activity factor<sup>2</sup> of the modulation signal,  $f_0$  is the frequency of modulation, C is the capacitance at that node, and  $\Delta V$  is the change in the node's bias voltage upon turn ON/OFF. The loss is particularly relevant for nodes associated with large capacitance, such as the drains of  $M_8$  (connected to a large supply switch and large bypass capacitors) and  $M_7$  (connected to large decoupling capacitors). The pMOS supply switch must be large to have a small on-resistance to not degrade static drain efficiency. This comes at the cost of an increased parasitic capacitance, which exacerbates the switching losses as the PA is turned ON/OFF, resulting in a trade-off in its size. The input and output decoupling capacitors need to be optimized as well to minimize switching loss. The input decoupling capacitor can be conveniently absorbed in the input matching network thereby reducing its loss. A similar technique could not be utilized on the output side, and the decoupling capacitance (which has a poor quality factor at mmWave frequencies) was chosen carefully so as to minimize the degradation in static performance while not increasing dynamic power dissipation too severely. The simulated degradation in average efficiency is shown in Fig. 3.12(b) for a 500MHz clock input with 50% duty-cycle as the supply bypass of the main PA is increased from the nominal design point, keeping the driver's supply bypass and all other capacitors unchanged. Evidently, there exists an optimal value for the supply bypass which should be used to maximize PAE. It should be noted that a PRBS sequence has an activity factor of  $\frac{1}{4}$ , while that for a 50% duty-cycle clock is 1. Thus, a 500MHz clock input corresponds to a 2Gbps PRBS (assuming settling time of the circuit is << 500 ps so it can support the modulation).

**3.4.2.1.3** Digital Path Delays and Rise/Fall Times: While digital path timing is critical in all high-resolution DACs, its importance within a mmWave power DAC unit cell from average efficiency and reliability points of view is described here. The delays and rise/fall times are carefully tailored so that, as the PA is turning off,  $M_8$  turns OFF before  $M_5$  turns ON. Otherwise, there would be a period of time when both  $M_8$  and  $M_5$  are ON, thereby applying a high gate bias to  $M_6$ . The resulting increase in current would degrade average efficiency and is likely to affect long term reliability as well. This phenomenon is illustrated in Fig. 3.13 for a 500MHz clock input with 50% duty-cycle, where slow rise time of the supply-switch control ( $V_{ctrl}$ ) results in a large drain current  $I_{DS}$  being drawn by the main PA before turning off. The slow rise time is created by introducing

<sup>&</sup>lt;sup>2</sup>Defined as the average number of rising-edge falling-edge pairs per clock period.



Figure 3.13: (a) Slow rise time of the supply-switch control  $(V_{ctrl})$  in the presence of extra digital path interconnect, (b) the resulting spike in drain current  $I_{DS}$  of the main PA in the DAC cell, and (c) circuit illustration of the mechanism.

additional interconnect between the final inverter and the supply switch. This issue can be better averted by using retiming to align the various digital signals.

**3.4.2.1.4 Ground/Supply Bounce:** Finally, the presence of supply and ground wirebond inductances can cause the on-chip supply and ground nodes to "bounce" [84]. Fig. 3.14 illustrates simulated supply and ground bounce using ground wirebond inductances of 300pH and supply wireobnd inductances of 500pH. Evidently, the supplies of the output stage (main PA) and the



Figure 3.14: Simulations illustrating bounce in the supplies of (a) the output stage (main PA) and (b) driver stage of the 1-bit mmWave power DAC with 20pF bypass capacitances and without them. Simulations comparing bounce in the on-chip (c) RF ground and (d) digital ground of the power DAC in the presence and absence of 20pF bypass capacitances. Supply and ground wirebond inductances of 500pH and 300pH respectively were used.

driver, as well as the on-chip RF and digital grounds sustain higher bounce in the absence of bypass capacitance than when 20pF bypass capacitances are included. This is a particular challenge in RF DACs because there is a large change in dc current between the ON and OFF states (Fig. 3.15(a)). Ground/supply bounce can cause changes in the instantaneous  $V_{GS}$  of devices, turning them ON/OFF unpredictably which compromises desired functionality and can pose reliability concerns as well. Furthermore, the ringing in the waveforms caused by ground/supply bounce increases the settling time of the PA, thereby limiting modulation speed (Fig. 3.15(b)). Since the high frequency signal path conducts a large current and would exhibit largest "bounce", the impact on the digital components can be mitigated by separating the on-chip grounds for the digital and mmWave paths. In addition, by deriving all bias voltages from the on-chip power supply and ground, and using sufficient on-chip supply bypass, ground and supply bounce can be equalized so that device voltages become immune to them. Using large number of ground pads is also recommended so that the on-chip and off-chip ground potential are comparable. Interfacing to off-chip signal differentially e.g. using LVDS for digital signals and differential RF input and output ports also improves resilience to ground and supply bounce.



Figure 3.15: (a) Large change in current at the switching instant of the 1-bit power DAC. (b) The resulting ground bounce (arising from ground wirebond inductance) causes ringing in the waveforms affecting the settling time and hence modulation speed of the PA.



Figure 3.16: (a) Simulation setup for input-side modulation of the mmWave 1-bit power DAC cell. Simulated average large-signal performance at 45GHz with (b) supply-switching and (c) input-side modulation, both using a clock input with 50% duty-cycle.

#### 3.4.2.2 Comparison with Conventional Input-side Modulation

An alternative means of accomplishing OOK is modulation exercised from the input of the PA. The supply-switched PA can be tailored to support input-side modulation by inserting a switch (controlled by the modulating signal) in series at the input of the PA (Fig. 3.16(a)). The digital path is deactivated, except for the control to the input-match switch  $M_1$ , which is retained in order to present the driving source with a 50 $\Omega$  load when the PA is not driven. Figs. 3.16(b) and (c) compare the simulated performance metrics of the supply-switched architecture with input-side modulation. Evidently, amplitude modulation via supply-switching yields higher average output power as well as average efficiency (since DC power consumption is eliminated when the PA is turned OFF) and can therefore support modulation speeds of  $\approx$  700Mbps with average drain efficiency >10%.

#### 3.4.2.3 Experimental Results

The 1-bit power DAC (Fig. 3.5) is tested in chip-on-board configuration through on-chip probing. A comparison of measured and simulated small signal S-parameter is shown in Fig. 3.17.

A peak S21 of 20dB is measured in small signal at 51GHz. The setup for conducting modulation measurements is illustrated in Fig. 3.18. A R&S SMY01 signal generator serves as the input clock for an Anritsu MP1763B PPG, which provides a PRBS as the ASK control bit to the PA. The modulated waveform is displayed in an Agilent 86100B oscilloscope, which is triggered by a "Pattern Sync" signal from the PPG. Average output power is measured using Agilent N1914A power meter, while the modulated spectrum is observed on an Agilent E4448A PSA. A comparison of measured and simulated performances under continuous-wave operation is shown in Fig. 3.19(a).

A saturated output power of 18.2dBm with a peak PAE of 15.3% is measured at 47GHz. OOK modulation using a  $2^{7}$ -1 PRBS was applied at modulation speeds ranging from 100Mbps to 1Gbps along with a 47GHz carrier input at the Class-E drive level. Modulation rates beyond 1Gbps could not be applied owing to the limitation of the clock generator. Fig. 3.19(b) summarizes the measured large-signal average performance metrics for different modulation speeds. At 400Mbps, an average output power of 15.7dBm was measured, and an average drain efficiency of  $\approx 10\%$  is maintained. In Fig. 3.20, screenshots of the time domain waveform and the modulated output spectrum are shown for a modulation rate of 1Gbps. The zoomed-in views for the time-domain



Figure 3.17: Small-signal S-parameter measurements of the 1-bit digital-to-mmWave Power DAC prototype.

output are shown in Fig. 3.21 from which the rise time and fall time are calculated to be  $\approx 213$ ps and  $\approx 225$ ps respectively. No bit errors were observed at 1Gbps, indicating that the 1-bit power DAC can be used beyond 1Gbps. The measured amplitude modulation depth (or extinction ratio) is 32dB.

## 3.4.3 Linearized Multi-bit mmWave Power DAC using Supply-switching and Digitally-controlled Load Modulation

#### 3.4.3.1 Architecture

The 1-bit mmWave power DAC is used in a digitally-controlled quarter-wave load-modulated switching PA array architecture (Fig. 3.22). Introduced in [69], several (n) Class-E-like mmWave 1-bit power DACs (described previously) are combined using the lumped-element quarter-wave



Figure 3.18: Setup used for large-signal measurements of the 1-bit digital-to-mmWave Power DAC prototype.



Figure 3.19: (a) Measured and simulated large-signal continuous-wave performance of the mmWave 1-bit power DAC cell at 47GHz, and (b) measured average large-signal metrics at 47GHz with 2<sup>7</sup>-1 PRBS OOK at different speeds.

combiner (similar to the one described in Chapter 2). A key feature of the combiner is its loadmodulation behavior as PAs are turned OFF. Assume that n-m PAs are turned OFF, and mPAs are kept ON. Each PA is designed to present a short-circuit output impedance to the com-



Figure 3.20: (a) Measured DAC cell time-domain output with 1Gbps 2<sup>7</sup>-1 PRBS OOK input and 47GHz carrier. (b) Measured DAC cell output spectrum with the 1Gbps OOK modulation.



Figure 3.21: Zoomed-in view of the measured DAC cell time-domain output showing (a) rise time and (b) fall time with 1Gbps 2<sup>7</sup>-1 PRBS OOK input and 47GHz carrier. Setup losses have not been de-embedded.

biner when turned OFF. The  $\lambda/4$  branch transforms this short-circuit impedance to an open circuit at the combining point. Consequently, the impedance seen by the *m* ON PAs is  $Z_0^2/(50m)$  (=200/*m* ohms in the implementation described here, as  $Z_0 = 100\Omega$ , L=353pH and C=35.4fF). Switching-class PAs are essentially voltage-source-like PAs which produce an output power that is



Figure 3.22: Schematic of the 42.5GHz digitally-controlled quarter-wave-load-modulated switching PA array. The state of the array for  $n = 8, m = 5, R_l = 25\Omega$  is shown for illustration. The red digital paths are for the OFF PAs (shaded in grey) while the green digital paths are for the ON PAs.

inversely proportional to load resistance. Consequently, the output power of each PA is given by  $P_{unit} \propto V_{DD}^2/(200/m) \propto m$  and the total output power is given by  $P_{out} \propto m^2$ . Thus, the load modulation makes the output amplitude linear with m. Various design considerations pertaining to this architecture are thoroughly discussed and corroborated with theoretical analyses, simulations and measurements in [14] and only the highlights of experimental results are discussed in this chapter.

Three possible usage scenarios may be envisioned for this architecture, namely: (i) as a mmWave power DAC, where the input is maintained at a Class-E drive level and output amplitude is controlled digitally by turning PAs ON and OFF, (ii) as a power amplifier with the digital control purely exercised as a means of efficient static output power control (i.e. to support low-power modes with high efficiency), and finally (iii) as a power amplifier where the output modulation is constructed by means of a combination of input modulation and digital control for efficiency under back-off. Options (ii) and (iii) are enabled by the fact that the mmWave Class-E-like PAs do possess linearity and small-signal gain due to soft-switching at mmWave frequencies [3]. The third option bears some resemblance to a multi-step Doherty architecture [72], but is distinct in the nature of the output combiner and the load-modulation mechanism.

#### 3.4.3.2 A Linearized 3-bit 42.5GHz Power DAC Implementation

A three-bit mmWave power DAC operating at 42.5GHz is implemented based on the architecture proposed in Section 3.4.3.1 (Fig. 3.22). Eight 1-bit mmWave power DAC unit cells (described previously) are power combined using a lumped quarter-wave combiner (described in Section 2.8.2). Since the power DAC unit cells have an optimal load impedance of 25 $\Omega$ , the eight-way combiner used in the power-combined array is designed to transform a 50 $\Omega$  load at the output pad to a 25 $\Omega$ load at each of its inputs. This corresponds to  $Z_0 = 100 \ \Omega$ , requiring L=353 pH and  $C_p=35.4$  fF for the lumped element model at 45 GHz. The combiner has a peak efficiency of 65% and the spiral used has  $Q_L = 12$  and  $Q_C = 50$  from EM simulations. The lower peak efficiency is due to the higher transformation ratio and the lower  $Q_L$  of the spirals. Eight digital control lines ( $b_1 - b_8$ ) determine the ON/OFF state of the unit-cells and thereby determine the PA array's output modulation. The lengths of these digital control lines are equalized to minimize skew in the digital control word input to the PA array during modulation. Eight-way power-combining of the 1-bit DAC cells in the linearizing architecture is expected to yield an effective three bit amplitude resolution along with an output power of 25dBm, accounting for the combiner's loss.

#### 3.4.3.3 Experimental Results

The fabricated three-bit digital to mmWave PA array prototype is shown in Fig. 3.23 and has an active area of 3.2 mm×1.3 mm. Small-signal S-parameter measurements shown in Fig. 3.24 indicate that input and output match are maintained across the digital control settings. The measured large-signal  $P_{out}$  vs.  $P_{in}$  profile of the PA array measured at 42.5 GHz can be seen in Fig. 3.25(a) across m. A  $P_{sat}$  of 23.4 dBm is achieved when all PAs are ON. Fig. 3.26 shows drain efficiency and PAE as a function of output power at 42.5 GHz across digital control settings. The optimal drain efficiency and PAE contours depict how the digital control should be exercised in conjunction with input modulation for maximum average efficiency for usage scenario (iii). Our



Figure 3.23: Chip photograph of the Q-band three-bit digital to mmWave PA array prototype.

measurements show a 2.25× improvement in drain efficiency and a 1.75× improvement in PAE at 6dB back-off over the baseline case where all PAs are always kept ON. The peak PAE (6.7%) and output power (23.4 dBm) are lower than simulated (14% and 24.5 dBm) due to lower PAE in the unit cell in measurement vis-a-vis simulation and frequency mismatch between the PAs and the combiner. The PAE and gain under back-off (lower values of m) are also expected to be higher in an SoC transmitter implementation, either through elimination of the input 50  $\Omega$  terminations presented by OFF PAs (which degrade PAE and gain under back-off) through co-design with the preceding driver stage or through the addition of another driver stage within each supply-switched unit-cell.

Usage scenario (i) is a subset of scenario (iii) where the input power is kept constant at the peak value and the PA behaves as a 3-bit quantizer. The efficiency vs  $P_{out}$  curve would therefore be a set of discrete points for a constant  $P_{in}$  across m settings in Fig. 3.26. The measured saturated output voltage (Fig. 3.25(b)) displays the expected linear profile with m demonstrating its utility as a three-bit mmWave power DAC. The DNL and INL of the prototype are shown in Fig. 3.27. The DNL never exceeds 0.5 LSB and the INL is always within 1 LSB. The simulated phase-shift as a function of digital control word is shown in Fig. 3.25(b) and exhibits very small AM-PM nonlinearity. Since a phase modulator was not implemented in this work, we resort to simulations to determine the required phase modulator resolution to meet spectral mask requirements in a complete mmWave transmitter implementation. Assuming that EIRP requirements in the 45GHz



Figure 3.24: Measured small-signal S-parameters vs. digital control setting of the three-bit digital to mmWave PA array prototype.

band for SATCOM applications are similar to those in the 60GHz regime, the IEEE 802.11ad spectral mask can be used to illustrate the impact of phase modulator resolution on the output spectrum of the linearized DAC as shown in Fig. 3.28. It can be seen that a phase resolution of 4 bits or higher in conjunction with this 3-bit power DAC can satisfy spectral emission requirements.

#### 3.4.4 Comparison with State of the Art

Table 3.1 compares this work with state-of-the-art CMOS mmWave PAs, some of which employ efficiency enhancing architectures. This PA achieves one of the lowest degradation in PAE under 6 dB back-off while having the highest saturated output power among PAs using such architectures.



Figure 3.25: (a) Measured output power versus input power and (b) measured saturated output voltage and simulated phase shift for different digital control settings (i.e., different number of PAs on) at 42.5 GHz for the three-bit digital to mmWave PA array prototype.



Figure 3.26: (a) Drain Efficiency vs.  $P_{out}$  and (b) PAE vs.  $P_{out}$  for different digital control settings (i.e., different ms) at 42.5 GHz for the three-bit digital to mmWave PA array prototype. \*Curves are slightly offset for clarity.

### 3.5 Supply-modulated mmWave Class-E-like Power DACs

The mmWave power DAC proposed in [14] demonstrated a pathway to realize mmWave power DACs with high output power that can be turned ON and OFF at high speeds for supporting



Figure 3.27: Measured DNL and INL of the three-bit digital to mmWave PA array prototype at 42.5 GHz.



Figure 3.28: Phase modulator resolution required to satisfy IEEE 802.11ad spectral mask when the proposed mmWave 3-bit power DAC prototype is incorporated into a transmitter.

efficiency enhancing architectures

| Ref.      | Technology   | Freq. | $\mathbf{V}_{\mathbf{D}\mathbf{D}}$ | $\mathbf{P_{out,sat}}$ | η    | $\frac{\eta_{-6dB}}{\eta_{max}}$ | $\mathbf{PAE}_{\max}$ | PAE <sub>-6dB</sub><br>PAE <sub>max</sub> | $\operatorname{Gain}_{\max}$ | $\Delta f/f_0$ | ITRS             | Area     | Architecture        | Fully           | Integration    |
|-----------|--------------|-------|-------------------------------------|------------------------|------|----------------------------------|-----------------------|-------------------------------------------|------------------------------|----------------|------------------|----------|---------------------|-----------------|----------------|
|           |              | (GHz) | (V)                                 | (dBm)                  | (%)  | (%)                              | (%)                   | (%)                                       | (dB)                         | (%)            | FoM <sup>4</sup> | $(mm^2)$ |                     | Integrated?     | Complexity     |
| This work | 45nm         | 42.5  | 2.6                                 | 23.4                   | 8.2  | 70.7                             | 6.7                   | 67.7                                      | 15                           | 32             | 58.8             | 4.16     | Proposed 3-bit      | Yes             | Direct digital |
|           | SOI CMOS     |       |                                     |                        |      |                                  |                       |                                           |                              |                |                  |          | mmWave Power DAC    |                 | to mmWave      |
| [85]      | 40nm         | 80    | 0.9                                 | 20.9                   | N/R  | N/R                              | 22.3                  | 32.7                                      | 18.1                         | 19.5           | 70.5             | 0.19     | 4-way diff.         | Yes             |                |
|           | CMOS         |       |                                     |                        |      |                                  |                       |                                           |                              |                |                  |          | power combined      |                 | PA only        |
| [4]       | 45nm         | 41    | 5                                   | 21.6                   | N/R  | N/R                              | 25.1                  | 60                                        | 8.9                          | 22             | 56.8             | 0.3      | 4-stacked PA        | Yes             |                |
|           | SOI CMOS     |       |                                     |                        |      |                                  |                       |                                           |                              |                |                  |          |                     |                 | PA only        |
| [86]      | 40nm         | 60    | 1.2                                 | 22.6                   | N/R  | N/R                              | 7                     | 35.7                                      | 29                           | 11.6           | 75.6             | 2.16     | 8-way diff.         | Yes             |                |
|           | CMOS         |       |                                     |                        |      |                                  |                       |                                           |                              |                |                  |          | power combined      |                 | PA only        |
| [56]      | 45nm         | 45    | 5.5                                 | 28                     | N/R  | N/R                              | 14                    | 35.7                                      | N/R                          | N/R            | N/A              | 11.25    | 4-way diff.         | No <sup>7</sup> |                |
|           | SOI CMOS     |       |                                     |                        |      |                                  |                       |                                           |                              |                |                  |          | spatial power comb. |                 | PA only        |
| [57]      | 40nm         | 60    | 1                                   | 15.6                   | 236  | $28.5^{6}$                       | N/A                   | N/A                                       | N/A                          | 11.6           | N/A              | 0.33     | Out-phasing         | Yes             | modulator      |
|           | CMOS         |       |                                     |                        |      |                                  |                       |                                           |                              |                |                  |          |                     |                 | + PA           |
| [73]      | 45nm         | 42    | 2.5                                 | 18                     | 33   | 72.7                             | 23                    | 73.9                                      | 7                            | $N/R^1$        | 51.1             | 0.64     | Doherty             | Yes             | PA only        |
|           | SOI CMOS     |       |                                     |                        |      |                                  |                       |                                           |                              |                |                  |          | 2-stacked PAs       |                 |                |
| [80]      | 45nm         | 45    | 4                                   | 21.3                   | 24   | 52.1                             | 16                    | 56.2                                      | 7.4                          | 13.3           | 53.8             | 1.15     | 4-stacked 8 bit     | Yes             | modulator      |
|           | SOI CMOS     |       |                                     |                        |      |                                  |                       |                                           |                              |                |                  |          | I/Q power-DAC       |                 | + PA           |
| [5]       | 45nm         | 45    | 5.1                                 | 24.3                   | 21.3 | N/R                              | 14.6                  | N/R                                       | >18                          | 15             | 67               | 7.67     | 4-stacked           | $No^{3,5}$      | Direct digital |
|           | SOI CMOS     |       |                                     |                        |      |                                  |                       |                                           |                              |                |                  |          | 2 bit power-DAC     |                 | to mmWave      |
| [87]      | $0.13 \mu m$ | 46    | 5                                   | 21.8                   | 21   | 60                               | 18.5                  | 59.5                                      | 15                           | N/R            | 62.73            | 1.6      | 2-stacked           | Yes             | Direct digital |
|           | SiGe BiCMOS  |       |                                     |                        |      |                                  |                       |                                           |                              |                |                  |          | 1 bit Power DAC     |                 | to mmWave      |
| [88]      | $0.13 \mu m$ | 46    | 5                                   | 28.9                   | 20   | 60                               | 18.5                  | 59.5                                      | 15                           | N/R            | 69.83            | 13.65    | 3-bit mmWave        | Yes             | Direct digital |
|           | SiGe BiCMOS  |       |                                     |                        |      |                                  |                       |                                           |                              |                |                  |          | Power DAC           |                 | to mmWave      |

Table 3.1: Comparison with State-of-the-Art mmWave PAs with  $P_{sat} > 20$ dBm or employing

<sup>1</sup>Large-signal performance across frequency is not reported. <sup>2</sup>Measurement below 33GHz is limited by equipment. <sup>3</sup>Assumes 3dB external differential-to-single-ended converter. <sup>4</sup>Defined as  $P_{sat}(dBm)$ + Gain(dB) + 20 log<sub>10</sub>(Freq.(GHz)) + 10 log<sub>10</sub>(PAE). <sup>5</sup> Does not have an on-chip choke inductor (biased using external bias-Ts). <sup>6</sup>Entire system metrics (for a fair comparison) inferred from supplied graph assuming power consumption of I/Q mixer remains constant. <sup>7</sup> Uses an on-PCB input matching network.

complex modulations with high efficiency under back-off. However, the power DAC achieves 3 bits of amplitude resolution with 8-way power-combining. Despite higher back-off efficiency, increasing the amplitude resolution in [14] beyond 4 bits (using 16-way power-combining) is challenging. Consequently, alternative means of modulation are desirable. This section discusses techniques to realize high resolution mmWave power DACs that can support complex modulations while still retaining high output power as well as peak and average efficiencies.

Switching PAs (and switches in general) are inherently linear with respect to excitations from the drain side [40]. Thus, multi-level supply modulation for switching PAs can facilitate a linear transfer characteristic from the digital word to the output amplitude. However, high efficiency under back-off is also a key requirement when contemplating supply modulation for switching PAs. We conduct a simulation-based experiment to investigate the back-off characteristics of the 2-stacked Class-E-like PA in IBM 45nm SOI CMOS described in Chapter 2 under ideal supply modulation. Fig. 3.29 shows the (post-layout) large signal characteristics as the supply voltage for the PA is varied in simulation. The output voltage follows the linear profile with  $V_{DD}$  as



Figure 3.29: Simulated (a) output amplitude and supply impedance (defined as  $\frac{V_{DD}}{I_{dc}}$ , where  $I_{dc}$  is the supply current drawn under large signal operation) and (b) drain efficiency and PAE of the 2-stacked Class-E-like PA in IBM 45nm SOI CMOS (Fig. 2.13(a)) as the supply voltage is varied.

expected (Fig. 3.29(a)) while from Fig. 3.29(b),  $\frac{\eta_{-6dB}}{\eta_{max}}$ =84%, which is significantly higher than a Class-B back-off ( $\frac{\eta_{-6dB}}{\eta_{max}}$ =50%). Of course, for a practical implementation, the efficiency of the supply modulator has to be factored into the overall efficiency profile of the PA. Interestingly, the supply impedance (shown in Fig. 3.29(a) and defined as  $\frac{V_{DD}}{I_{dc}}$ , where  $I_{dc}$  is the supply current drawn under large signal operation) varies as the supply voltage is changed. This contradicts the characteristics of Class-E PAs, since from Eqn. 2.25 in 2.5.3, the supply impedance of Class-E-like PAs should remain constant with  $V_{DD}$ . This suggests that the stacked configuration deviates from desired stacked Class-E operation as the supply voltage is varied (while keeping DC biases of gates fixed), potentially resulting in higher efficiency degradation under back-off (with respect to 84% as shown in Fig. 3.29(b)) and unequal voltage stress across stacked devices (which can jeopardize long term reliable operation).

These issues can be better appreciated by repeating the exercise for the 4-stacked Class-E-like PA reported in Chapter 2 by varying the supply voltage while the DC biases of the gates are kept fixed at their nominal values. Simulations indicate that  $\frac{\eta_{-6dB}}{\eta_{max}}$ =59% (Fig. 3.30(b)), which is close to Class-B. More importantly, as shown in Fig. 3.30(c), as the supply voltage is reduced without altering the DC biases of the gates, the device junction stress increases beyond the breakdown volt-



Figure 3.30: (a) Schematic of the 4-stacked Class-E-like PA in IBM 45nm SOI CMOS discussed in Chapter 2. Simulated (b) drain efficiency and PAE and (c) drain-gate stress of  $M_3$  and  $M_4$  as the supply voltage is varied while keeping gate biases fixed at  $V_{g1}=0.45$ V,  $V_{g2}=1.65$ V,  $V_{g3}=2.85$ V,  $V_{g4}=4.05$ V.

age, resulting in catastrophic failure. One might consider using a resistive divider based DC biasing scheme for the devices in a stacked configuration, to scale the DC biases of the gates along with the supply, as in [2] and shown in Fig. 3.31(a). The impact on relevant metrics under ideal supply modulation is shown in Fig. 3.31(b) and (c) which indicate that despite significant improvement in back-off efficiency ( $\frac{\eta_{-6dB}}{\eta_{max}}$ =80%) the variation in supply impedance persists, especially for low  $V_{DD}$ values. Additionally, the output amplitude exhibits non-linearity with supply modulation. For the



Figure 3.31: (a) Schematic of the 4-stacked Class-E-like PA in IBM 45nm SOI CMOS discussed in Chapter 2 with resistive divider DC biasing. Simulated (b) output amplitude and supply impedance, (c) drain efficiency and PAE as the supply voltage is varied. Nominally,  $V_{g1}=0.45$ V,  $V_{g2}=1.65$ V,  $V_{g3}=2.85$ V,  $V_{g4}=4.05$ V for  $V_{DD}=4.8$ V. (c) Comparison of desired (equal device stress) and actual drain-source voltages of  $M_1$  and  $M_4$  for  $V_{DD}=1.2$ V.

case  $V_{DD}=1.2$ V, Fig. 3.31(c) and (d) compare the simulated drain-source voltages of devices  $M_1$ and  $M_4$  using resistive divider bias with (desired) waveforms that would be obtained by scaling down the  $V_{ds}$  values at the full supply voltage. Evidently, the top device hardly sustains any voltage swing when operating under reduced supply voltage. In the next section, we analyze the cause of this problem and propose a novel circuit solution to address this issue.

#### 3.5.1 Voltage Scaling with Supply for Stacked Class-E PAs



Figure 3.32: Voltage swings at various nodes of stacked devices in Class-E PAs as a function of the supply voltage.

It was discussed in Chapter 2 (Section 2.5) that appropriate gate swing for all stacked devices is induced through capacitive coupling by means of device capacitances  $C_{gs}$ ,  $C_{gd}$  as well as gate capacitor  $C_n$ . The DC bias that is applied to each gate node (through a large resistor) must be equal to the average value of the corresponding desired gate voltage swing. The appropriate DC bias for each gate node can be derived using a simple analysis. Referring to Fig. 3.32, for an ideal stacked configuration with "n" devices we require

$$V_{d,i}(t) = \frac{i-1}{n} \times V_{d,n}(t)$$
(3.1)

Therefore,

$$V_{d,i,avg.} = \frac{1}{T} \int_{T/2}^{T} V_{d,i,OFF}(t) \ dt = (i-1) \times V_{DD}$$
(3.2)

In order to turn OFF a device completely,  $V_{gs,i}=0$  for t  $\epsilon$  [T/2,T]. Thus,

$$V_{g,i,OFF}(t) = V_{d,i-1,OFF}(t)$$
 (3.3)

$$\implies V_{g,i,avg.} = \frac{1}{T} \int_{0}^{T/2} V_{g,i,ON}(t) \ dt + \frac{1}{T} \int_{T/2}^{T} V_{d,i,OFF}(t) \ dt \tag{3.4}$$

$$\implies V_{g,i,avg.} = \frac{V_{ON}}{2} + (i-1) \times V_{DD}$$
(3.5)

where  $V_{ON}$  is the input drive amplitude chosen for optimal PAE. Since the DC bias for the gate of the input device is  $V_{g,1,avg.} = V_{g,1,bias} = V_{ON}/2$ , 3.5 reduces to

$$\implies V_{g,i,avg.} = V_{g,1,bias} + (i-1) \times V_{DD}$$
(3.6)

In other words, the DC bias of the gate nodes of stacked devices must vary with  $V_{DD}$  as governed by Eqn. 3.6 in order to retain the benefits of stacking (i.e equal voltage sharing) and Class-E switching characteristics (i.e. constant supply impedance vs  $V_{DD}$ ) under supply modulation. This hints at a *supply-adaptive* biasing circuit that can adjust the DC biases of gate nodes in accordance with Eqn. 3.6 as  $V_{DD}$  is varied. Fig. 3.33 depicts a circuit implementation that can perform the aforementioned function for a 4-stacked PA. The (variable) supply voltage  $V_{DD,PA}$  is used by a resistive divider network  $(R_1)$  to generate linearly scaled versions  $(V_1 \text{ to } V_3)$  which are then applied to a set of PMOS level-shifters. The programmable current source sets the  $V_{gs}$  of the bottom NMOS device to  $V_{g,1}$  while the PMOS level-shifters are sized to have the same  $V_{gs}$ . The value of  $V_{g,1}$  is applied as the DC bias of the input (bottom) device of the PA while the voltages  $V_{g,2-4}$  are applied to the stacked devices in the PA. Since the gate biases of the PMOS levelshifters are linearly scaled versions of  $V_{DD,PA}$  and they have  $V_{gs}=V_{g,1}$  by design, the corresponding source voltages are  $V_{s,i} = V_{g,i} = V_{g,1} + (i-1) \times V_{DD}$ , which is the desired voltage profile from Eqn. 3.6. The relevant performance metrics comparing the proposed supply-adaptive biasing with resistive divider biasing are summarized in Fig. 3.34 which indicate that the supply-adaptive biasing scheme provides higher back-off efficiency while retaining a linear output profile and near-constant supply impedance (except at very low  $V_{DD}$ ) with supply modulation.


Figure 3.33: Supply-adaptive biasing circuit for providing DC bias of gate nodes in stacked Class-E PAs.

#### 3.5.2 Supply-modulators for mmWave Power DACs

As was mentioned in the previous section, though supply modulation for stacked switching PAs can potentially facilitate high resolution power DACs with high back-off efficiency, the efficiency of the supply modulator plays a critical role in determining the overall efficiency under back-off. Typically, supply modulators can be categorized as:

- Linear modulators such as Low Dropout Regulators (LDOs) and Pulse-width Modulators [89]. The former suffers from poor efficiency while the resolution of the latter is limited by the minimum pulse-width that can be generated reliably in advanced technology nodes.
- Switching modulators based on switched-inductor and switched-capacitor (SC) dc-dc converters. Switched-inductor converters (such as buck/boost/buck-boost converters) are capable of high efficiency operation but usually require off-chip or bondwire inductors [90], making



Figure 3.34: Comparison of large signal metrics for resistive divider bias and proposed supplyadaptive bias for stacked Class-E PAs.

them less suitable for a fully integrated implementation. In contrast, high efficiency switchedcapacitor converters using on-chip capacitors have been demonstrated [91], [92], [93].

• Hybrid modulators, which employ a combination of linear and switching modulators [94–96]



Figure 3.35: Architecture of a switched-capacitor supply-modulated mmWave power DAC with high amplitude resolution to support complex modulations. The amplitude control word operates at a symbol rate  $f_s$ .

# 3.5.3 Switched-capacitor Supply-modulated Hybrid mmWave Stacked Power DAC

As discussed in [93], [97] capacitors have substantially higher energy and power density than their magnetic counterparts. This, in conjunction with better switch utilization makes the SC converter a promising candidate for realization of efficient, fully integrated dc-dc converters with high power density in CMOS platforms. Furthermore, capacitors have smaller footprint and higher quality factor (at low frequency) than on-chip inductors. Consequently, a bank of efficient SC converters

with different step-up and/or step-down ratios (i.e. gain configurations) can facilitate a large number of output levels, which is suitable for supply modulation of power DACs with high amplitude resolution. This is conceptually depicted in Fig. 3.35 where a reference voltage  $V_{ref}$  is stepped up/down depending on the SC gain configuration (i.e conversion ratio) that is activated by the amplitude control word (operating at a symbol rate  $f_s$ ). It should be noted that a combination of step-up and step-down gain configurations should be utilized, since SC converters exhibit poor efficiency for high conversion ratios [98].

The SC conversion ratios (i.e. number of gain configurations) in the architecture of Fig. 3.35 should be chosen to maximize average efficiency when supporting complex modulations. Although the PAPR (and hence the average efficiency) is dependent on the modulation format, we proceed to choose conversion ratios assuming a PAPR of 6dB which is typical for higher order modulations. The finite efficiency of SC converters (typically  $\approx 80-90\%$  for low conversion ratios and even lower for steeper transformation) suggest using only a few gain configurations. This would inevitably result in limited amplitude resolution for the power DAC. Thus, we adopt an approach where few bits of amplitude resolution are implemented through supply modulation, while segmented source degeneration (tail) transistors are used for additional amplitude control. Fig. 3.36(a) illustrates a 4-stacked Class-E-like mmWave PA (driven by a 2-stacked Class-E-like driver) with a nominal supply voltage of 4.8V. Since a 6dB reduction in output power through supply modulation requires operation off a 2.4V supply, the corresponding 1-bit (i.e. two level) supply modulation is realized through a combination of 1:2 step-up SC converter (with simulated efficiency of 80%) and an ideal supply-switch, driven by a reference voltage of 2.4V. Additional amplitude resolution is incorporated by means of a 7-bit binary-weighted tail transistor array. The use of segmented tail transistors is in contrast to the baseband and RF-DAC topologies discussed in Section 3.3. The motivation for this is as follows:

1. The output profile of a DAC should be monotonic (and ideally linear) with the digital code (as demonstrated by the 3-bit power DAC in [14] and discussed in Chapter 3). The RF-DAC approach presented in [2] results in an output profile with amplitude control word as shown in Fig. 3.3. The output profile can be considered to be monotonic only for half the amplitude word range (either from 0 to 128 or 128 to 256). Although the non-linearity in output profile can be corrected in the individual ranges using DPD, the non-monotonicity would prevail.



Figure 3.36: (a) Schematic of a 4-stacked Class-E like PA with two-level i.e. 1-bit supply modulation and 7-bit binary-weighted tail transistor modulation. Comparison of simulated (b) drain efficiency and (c) PAE under back-off (as fractions of respective peak values) of the proposed topology at 60GHz in IBM 45nm SOI CMOS with a case where back-off is exercised solely through tail transistor switching. (Note: DC biases for gates of the 4-stacked PA are generated by a supply-adaptive biasing circuit.)

- 2. The RF-DAC topology reduces the capacitive loading on the LO path and can provide higher gain to the LO signal owing to the presence of cascode device. However, the LO and transconductor devices both operate in switching mode in a modulator. Thus, the increase in small signal power gain of a cascode configuration relative to a common-source device is of little significance under (large-signal) switching operation.
- 3. The mmWave signal flows through the devices in the data path. Given that a large segmented array of devices is used to realize the data path, layout and accurate modeling of these devices become challenging.

Using segmented tail transistors results in an output profile that is monotonic (although non-linear) with the amplitude control word. Furthermore, the tail devices do not support mmWave signals, thereby reducing the layout and modeling complexities significantly.

Fig. 3.36(b) and (c) compare the simulated back-off characteristics of the proposed topology



Figure 3.37: (a) Schematic of a 4-stacked Class-E like PA with three-level supply modulation and 7-bit binary-weighted tail transistor modulation. Comparison of simulated (b) drain efficiency and (c) PAE under back-off (as fractions of respective peak values) of the proposed topology at 60GHz in IBM 45nm SOI CMOS with a case where back-off is exercised solely through tail transistor switching. (Note: DC biases for gates of the 4-stacked PA are generated by a supply-adaptive biasing circuit.)

at 60GHz with a case where amplitude resolution is implemented solely through tail transistor switching. In both cases, reduction in output power is achieved by turning OFF tail devices. Evidently, a combination of supply and tail transistor modulation results in better back-off efficiency  $(\frac{\eta_{-6dB}}{\eta_{max}}=71\%)$  compared to only tail device switching  $(\frac{\eta_{-6dB}}{\eta_{max}}=50\%)$ . Including an additional gain stage (2:1 step-down) in the SC converter bank (3.37(a)) was seen to not provide any improvement of back-off characteristics (Fig. 3.37(b) and (c)). This indicates that 1-bit supply modulation is sufficient to ensure high efficiency at 6-dB back-off in this case.

#### 3.5.4 Class-G Supply Modulated Hybrid mmWave Stacked Power DAC

It should be noted that in the context of stacking, the lower supply voltage(s) required for supply modulation of a 4-stacked DAC can be readily available. This is because the 2-stacked driver stage operates from a 2.4V supply while a nominal 1.2V supply (if needed) will be required by other components in a transmitter implementation anyway. Consequently, the SC modulators are not necessary and switching between the different supply voltages can be accomplished simply by using inverters operating between appropriate voltage levels. This is akin to a Class-G supply modulation scheme [95, 96] which utilizes multiple supply voltages that are selected depending on the output power requirement. Resorting to a Class-G approach has several other benefits. Firstly, in a switched-capacitor implementation, inverter drivers would be needed to switch between the different gain configurations. The Class-G scheme with inverter(s) for switching between supplies provides a convenient means of accomplishing supply modulation without the loss associated with SC implementation. More importantly, the simulation results presented in Figs. 3.36 and 3.37 assume that the efficiency of the SC converter remains constant as output power is backed off. This is not strictly correct since by using tail switching to exercise power control, supply impedance does not remain constant under back-off. In fact, the increasing source degeneration under backoff results in deviation from Class-E switching characteristics so that supply impedance increases with back-off and we approach a linear (current source) PA characteristic. The SC modulator efficiency can degrade under the circumstances owing to deviation from the optimal design point. Additionally, the SC converter requires a reasonably large output capacitor to reduce output ripple and minimize EVM. The use of the Class-G supply modulation approach avoids all these issues while retaining benefits of efficiency under back-off.

Based on the foregoing discussion and the simulation results of the precious section, a convenient means of achieving 1-bit supply modulation would be to use an inverter toggling between 4.8V and 2.4V supplies (required by the 4-stacked PA and the 2-stacked driver respectively). This leads to the proposed Class-G "hybrid" 8-bit power DAC architecture shown in Fig. 3.38(a). The MSB (8<sup>th</sup> bit) is used to switch between supplies while output power back-off for a given supply voltage is achieved by means of a 7-bit binary-weighted source degeneration transistor array (henceforth referred to as the tail DAC). The (pre-layout) drain efficiency and PAE under back-off are summarized in Fig. 3.38(b) while the output amplitude and normalized AM-PM profiles are shown in Fig. 3.38(c). The output amplitude profile is monotonic with the digital word, though 1-bit supply modulation introduces additional non-linearity (aside from the compressive output characteristic inherent to power DACs [78]). The use of DPD is expected to linearize the DAC and reduce the AM-PM as well.



Figure 3.38: (a) Schematic of the proposed Class-G supply modulated "hybrid" 8-bit mmWave stacked power DAC with a 4-stacked Class-E like output stage and a 2-stacked Class-E-like driver. One bit supply modulation is achieved using an inverter toggling between 2.4V and 4.8V supplies while 7-bit binary-weighted source degeneration transistors provide additional amplitude resolution. Simulated (pre-layout) (b) drain efficiency and PAE vs. output power and (c) output amplitude and normalized AM-PM distortion vs amplitude control word at 60GHz in IBM 45nm SOI CMOS (Note: DC biases for gates of the 4-stacked PA are generated by a supply-adaptive biasing circuit.)

As shown in Fig. 3.39(a), the proposed Class-G "hybrid" mmWave power DACs can serve as an enabling technology to realize energy-efficient, high-power transmitters where a combination of Class-G supply modulation and tail transistor switching can provide high amplitude resolution while phase modulation is imposed on the mmWave signal so that complex modulations can be supported. Non-linearity in the DAC can be corrected by using DPD. The result is a direct digitalto-mmWave transmitter architecture with simultaneous large-scale power combining, linearity (Fig. 3.39(c)) and high average efficiency (compared to conventional transmitters, Fig. 3.39(b)) which makes it suitable for *scaling-friendly* DSP intensive communication.



Figure 3.39: (a) A direct digital-to-mmWave transmitter architecture suitable for *scaling-friendly* DSP intensive communication (based on the proposed high resolution Class-G "hybrid" mmWave stacked power DAC) with simultaneous large-scale power combining, (b) higher back-off efficiency compared to conventional transmitters and (c) linear output profile and no AM-PM distortion.

# 3.6 Conclusion

Design considerations for supply-switched mmWave power DAC cells were presented. The proposed DAC cell, in conjunction with large-scale power-combining, is suitable for implementing high power, high efficiency mmWave transmitters capable of supporting complex modulation schemes. A novel architecture employing large-scale power combining, dynamic load modulation, and supplyswitching has also been demonstrated and results in a high-power highly linear three-bit digital to mmWave PA array prototype with high efficiency under back-off. Supply modulation for stacked Class-E PAs was discussed as a means of achieving linear amplitude modulation with high average efficiency. A supply-adaptive biasing circuit was presented to ensure appropriate scaling of DC bias voltages of intermediary gate nodes in a stacked power DAC under supply modulation. Finally, a Class-G supply modulator-based "hybrid" 4-stacked mmWave Class-E-like power DAC was presented for a practical implementation of high-resolution mmWave power DACs with high output power as well as high peak and average efficiencies.

# Chapter 4

# A Millimeter-wave Digital Polar Phased Array Transmitter

The Class-G hybrid stacked mmWave power DAC proposed in Section 3.5.4 can serve as an enabling technology to realize high-power "digital" mmWave transmitters with high average efficiency while supporting complex modulations. The focus of this chapter is to present a transmitter architecture that can leverage the benefits of the power DAC to realize the first efficient, high-power digital-intensive mmWave transmitter for long haul links.

Section 4.1 presents an overview of conventional transmitter architectures along with the associated merits and drawbacks. This is followed by a brief discussion of integrated mmWave phased array transmitters in Section 4.2. A link budget analysis is presented in Section 4.3 to comprehend the various system level requirements of mmWave links utilizing phased arrays. Section 4.4 discusses the design considerations and implementation details of a 60GHz four elemnt phased array digital polar transmitter utilizing the proposed Class-G hybrid power DAC.

# 4.1 Overview of Transmitter Architectures

The power DAC proposed in Section 3.5.4 can be utilized in several transmitter architectures:

- 1. A Doherty architecture.
- 2. A Cartesian (I/Q) transmitter.

3. A polar topology (as in Fig. 3.39(a)).

The traditional analog Doherty transmitter suffers from poorly defined onset of the auxiliary PA(s) and requires large passive network area for the input and output delay lines (realized using quarter-wave transmission lines). The works in [99, 100] demonstrated an on-chip impedance transformation network using transformers to replace the quarter-wave lines along with the use of digital PAs. The resulting digital Doherty architecture achieves high back-off efficiency enhancement along with an ultra-compact form-factor and broadband operation. However, the use of transformer-based networks limits the scalability of the architecture owing to the non-idealities associated with interwinding and parasitic capacitances of transformers (as discussed in Section 3.1) which cannot be easily compensated in a low-loss manner at mmWave. The foremost challenge with the Cartesian architecture is the implementation of an output I/Q power-combiner with low insertion loss as well as isolation between the inputs in order to minimize the I/Q crosstalk, which are difficult to accomplish on-chip. The interaction between the in-phase and quadrature-phase paths typically requires 2-D predistortion to meet EVM and spectral mask requirements [83]. While spatial I/Q power-combining [101] can help overcome this issue, a fully integrated solution is preferred for System-on-Chip (SoC) applications. The polar architecture allows driving the PA in saturation without doubling the signal path for modulation (as in a Cartesian transmitter), resulting in higher efficiency with a smaller footprint. Some major concerns about the polar implementation are the bandwidth expansion in the amplitude and phase paths as well as delay mismatch between them resulting in spectral regrowth and EVM degradation [102].

The digital polar architecture [78] is better suited for SoC applications and hence pursued in this work. The Class-G hybrid stacked mmWave power DAC at our disposal can facilitate for the first time long-haul mmWave links without resorting to off-chip high gain antennas. Consequently, augmenting such a power DAC with a high-speed phase modulator (with appropriate resolution) can facilitate a mmWave transmitter capable of supporting complex modulations with high average efficiency (Fig. 3.28). However, to overcome the high path loss at mmWave, a directional transmitted beam is preferred. This can be accomplished by a phased array transmitter implementation using moderate-gain on-PCB antennas, with the added advantage of electronic beam steerability (as opposed to mechanical alignment required for discrete high gain antennas).

# 4.2 Integrated MmWave Phased Array Transmitters: An Overview

Phased arrays are a class of multiple antenna systems that provide beamforming and beam steering capabilities, thereby providing an electronic alternative to physically imposing antennas that require mechanical maneuvering. A phased array realizes an "effective" high gain antenna in which phase-shifted signals from multiple low-gain antennas are combined to create a directional beam. Additionally, the direction of the beam can be scanned by varying the phase shift in each antenna element. The advent of scaled silicon processes has enabled operation in the mmWave regime along with a reduction in the physical size of the system since the antenna size and spacing are inversely proportional to frequency. Consequently, even though the invention of phased arrays was motivated by military requirements and finds extensive use in defense applications, advancements in IC technology have facilitated the integration of entire arrays on chip, making millimeter-wave phased-array-based architectures feasible for mass commercial applications.



Figure 4.1: EIRP in an N-element phased array transmitter each with peak unit element output power of P dBm and antenna gain of G dBi. A phased array implementation improves the EIRP by 20logN compared to a single element with the same output power and antenna gain.

A phased-array transmitter provides two significant advantages over isotropic transmitters:

1. For the same total transmit power, the power at the receiver is increased. The effective isotropic radiated power (EIRP) of a transmitter in a particular direction is defined as the power an isotropic transmitter would have to radiate to generate the same electric field in that direction. For example, in an N-element transmitter (Fig. 4.1), if each element radiates

P dBm and has the same antenna gain  $G(\phi)$ , the power radiated in the desired direction is  $N^2 PG(\phi)$  Watts. The N-element array improves the EIRP in the desired direction by 20 log N dB as compared to a single element, thereby increasing signal power at the targeted receiver.

2. The beamforming properties of the array ensure that the transmitter power is attenuated in other directions so that the interference at receivers that are not targeted is reduced.

The operating principle of a phased array transmitter is illustrated in Fig. 4.1 using a transmitter array with N elements spaced a distance d apart. Let us assume that the signal transmitted by the first element is  $V_1(t) = V(t)\cos(\omega_{RF}t + \alpha(t) + \theta_0)$ . The phase shifter in the  $i^{th}$  element adds a phase shift  $\theta_{i-1}$  as a result of which its output signal  $V_i(t)$  is given by

$$V_i(t) = V(t - \tau_i)\cos(\omega_{RF}(t - \tau_i) + \alpha(t - \tau_i) + \theta_{i-1})$$

$$(4.1)$$

where  $\tau_i = (i-1)\frac{d\sin\phi}{c}$  is the associated path delay for a transmit angle of  $\phi$ . The output of the array after free-space combining is

$$V_{out}(t) = \sum_{i=1}^{N} V(t - \tau_i) \cos(\omega_{RF}(t - \tau_i) + \alpha(t - \tau_i) + \theta_{i-1})$$
(4.2)

Assuming that the path delay is much smaller than the time period of the highest modulation frequency or that the modulation is narrowband,

$$V(t - \tau_i) = V(t - (i - 1)\frac{dsin\phi}{c}) \approx V(t)$$
(4.3)

and 
$$\alpha(t - \tau_i) = \alpha(t - (i - 1)\frac{dsin\phi}{c}) \approx \alpha(t)$$
 (4.4)

Applying these approximations to Eq. 4.2, we get

$$V_{out}(t) \approx V(t) \sum_{i=1}^{N} \cos(\omega_{RF}(t-\tau_i) + \alpha(t) + \theta_{i-1})$$

$$(4.5)$$

$$= Re\left(V(t)e^{j(\omega_{RF}t + \alpha(t))}\sum_{i=1}^{N}e^{-j(\omega_{RF}\tau_i - \theta_{i-1})}\right)$$
(4.6)

For a linear progression in phase shift across elements,  $\theta_{i-1} = i\theta_0$ . Eq. 4.6 can therefore be

re-written as

$$V_{out}(t) = Re\left(V(t)e^{j(\omega_{RF}t + \alpha(t))}\sum_{i=1}^{N} e^{-j\left(\omega_{RF}(i-1)\frac{dsin\phi}{c} - i\theta_0\right)}\right)$$
(4.7)

$$= Re\left(V(t)e^{j\left(\omega_{RF}t + \alpha(t) + \omega_{RF}\frac{dsin\phi}{c}\right)}\sum_{i=1}^{N}e^{-ji\left(\omega_{RF}\frac{dsin\phi}{c} - \theta_{0}\right)}\right)$$
(4.8)

$$= Re\left(V(t)e^{j\left(\omega_{RF}t + \alpha(t) + \omega_{RF}\frac{dsin\phi}{c}\right)}e^{-j\frac{(N-1)(\psi-\theta_0)}{2}}\frac{sin\frac{N(\psi-\theta_0)}{2}}{sin\frac{\psi-\theta_0}{2}}\right)$$
(4.9)

where  $\psi = \omega_{RF} \frac{dsin\phi}{c}$ . It can be seen from Eq. 4.9 that

$$|V_{out}(t)| = \left| V(t) \frac{\sin \frac{N(\psi - \theta_0)}{2}}{\sin \frac{\psi - \theta}{2}} \right|$$
(4.10)

Hence, signals transmitted in a particular direction add constructively, while signals in other directions add destructively thereby resulting in beam formation. For a transmit angle  $\phi$  the maximum amplitude of  $V_{out}(t)$  is achieved when

$$\psi = \theta_0 = \omega_{RF} \frac{dsin\phi}{c} \tag{4.11}$$

A phased-array transmitter implementation helps overcome the high path loss at mmWave and is therefore pursued in this work. The number of elements in the array depends on a combination of circuit parameters as well as propagation losses and are best understood by means of a link budget analysis.

### 4.3 Millimeter-wave Link Budget Analysis

It is instructive to perform a link budget analysis to comprehend the system-level requirements of mmWave links using integrated phased-array transmitters. As shown in Fig. 4.2, let us consider a phased-array transceiver with N elements in the transmitter as well as the receiver. We further assume typical on-PCB antenna gain of 5dBi, 64QAM modulation with a bandwidth of 1GHz and an SNR of 20dB at the receiver as well as 10dB link margin. Since an N-element phased-array transmitter improves the SNR by N<sup>3</sup> compared to a single element implementation, one can derive the number of elements in the array to satisfy the link requirements at various frequencies for a given output power per transmit element, the receiver noise figure (NF), the desired link range along with the associated path loss (which is a combination of Friis propagation loss and



Figure 4.2: Phased array transceiver with N elements in the transmitter as well as receiver arrays for link budget analysis.

atmospheric absorption). The stacked SOI CMOS mmWave PAs discussed earlier in this thesis and those reported in [4,5] facilitate unprecedented improvement in output power thereby relaxing the number of elements, or extending the link range for a given array size. Thus, we aim to maximize the link range while using a moderate array size. Based on the saturated output power of stateof-the-art mmWave PAs reported in Table 2.4, an output power trend for stacked SOI CMOS PAs across frequency can be estimated as shown in Fig. 4.3(a). A literature survey of contemporary state-of-the-art mmWave receivers reveals a trend for NF (Fig. 4.3(b)) while the path loss across frequency for a 100m link is depicted in Fig. 4.3(c). The aforementioned considerations lead us to array sizes across frequency as illustrated in Fig. 4.3(d) for transmitting 64QAM modulation with 3.8dB PAPR. Since the 60GHz stacked power DAC discussed in Chapter 3 generates  $\approx$ 25dBm output power, the number of array elements is expected to reduce further. In fact, factoring in the higher output power at 60GHz in the foregoing analysis yields an array size of ten. To reduce design and integration complexity further, a four element phased-array digital polar transmitter at



Figure 4.3: (a) Trend for output power vs. frequency of state-of-the-art stacked SOI CMOS mmWave PAs determined from [3–5], (b) trend for Noise Figure (NF) of state-of-the-art mmWave receivers across frequency, (c) path loss (sum of Friis propagation loss and atmospheric absorption) for a 100m link across frequency and (d) array size for a phased array transmitter with output power trend shown in (a) for a 100m link. Typical on-PCB antenna gain of 5dBi, 64QAM modulation with a bandwidth of 1GHz and an SNR of 20dB at the receiver as well as 10dB link margin are assumed for the calculation.

60GHz is pursued in this work to achieve high data-rates ( $\approx$ 6Gbps with 64QAM modulaion) while maximizing the link range.



### 4.4 Digital Polar Transmitter Implementation

Figure 4.4: Impact of (a) amplitude and (b) phase resolution on EVM for non-pulse-shaped 1Gsps single-carrier (60GHz) 64QAM data in a digital polar transmitter. The various transceiver metrics are the same as those assumed in the link budget analysis in Section 4.3 except a peak transmitter output power of 25dBm.

The work in [81] presented the first mmWave (60GHz) digital polar transmitter in CMOS. The implementation was based on a low RF design approach for power DACs as in [78, 83] whereby several DAC unit cells are connected in parallel and each individual DAC cell is turned ON or OFF by a gate control signal. Such a strategy is not suitable for mmWave implementations since turning OFF a DAC cell results in loss of interstage matching between the driver stage and the output DAC stage. Furthermore, the layout interconnects between the different DAC cells need to be carefully accounted for and absorbed in the design without performance penalty. This limits the maximum number of DAC cells that can be connected in parallel. Additionally, each DAC cell needs to have a dc-decoupling capacitor at its input to isolate the gate biases of the DAC cells under turn ON/OFF. The culmination of these design choices resulted in a low output power ( $\approx$ 10dBm) and poor efficiency of the transmitter.

The focus of the remainder of this thesis is to demonstrate the first energy-efficient, highpower digital polar mmWave phased array transmitter based on high resolution hybrid stacked power DACs and capable of supporting unprecedented data-rates over long-haul links with high



Figure 4.5: Output PSD of a 60GHz digital polar transmitter vs. symbol-rate using unshaped 64QAM symbols. Amplitude and phase resolution are eight bits each. The maximum symbol-rate that satisfies the spectral mask without baseband pulse-shaping is  $\approx$ 400Msps.

average efficiency. MATLAB-based simulations are conducted to determine the amplitude and phase resolutions that satisfy EVM requirements in a digital polar implementation. The high baseband modulation speeds ( $\approx$  several Gsps) at mmWave make it difficult to employ complex pulse-shaping (such as raised cosine filtering) owing to the high oversampling rate that would be required ( $\approx$ tens of Gsps). The use of non-pulse-shaped input data would result in a violation of the spectral mask (such as the IEEE802.11ad mask at 60GHz) and can be overcome by reducing the symbol-rate. For a target EVM of -27dB, it can be seen from Figs. 4.4(a) and (b) that amplitude and phase resolutions of 6 bits each are sufficient to meet the EVM requirement. However, to account for non-idealities, we choose 8 bits resolution for amplitude as well as phase in our implementation.

Fig. 4.5 depicts the PSD resulting from unshaped baseband data symbols in a 60Ghz digital polar transmitter with eight bits of amplitude and phase resolution. Despite the relaxed spectral emission requirements at mmWave the only means of satisfying the spectral mask is to reduce the



Figure 4.6: Output PSD of a 60GHz digital polar transmitter vs. symbol-rate using 64QAM symbols. The baseband symbols are shaped using a tenth order low-pass digital FIR filter with a cutoff frequency of 150MHz and an oversampling factor of eight. Amplitude and phase resolution are eight bits each. The maximum symbol-rate that satisfies the spectral mask with baseband pulse-shaping is  $\approx$ 1000Msps.

symbol-rate to  $\approx 400$ Msps thereby limiting the overall 64QAM data-rate to 2.4Gbps. The use of mild digital filtering can circumvent this limitation to some extent. For example, using a tenth order low-pass digital FIR filter with a cutoff frequency of 150MHz and an oversampling factor of eight pushes the maximum symbol-rate to  $\approx 1$ Gsps (Fig. 4.6).

Fig. 4.7 illustrates the (simplified) architecture of a 60GHz 4-element phased array prototype fabricated in IBM 45nm SOI CMOS along with schematics of the major components: (a) the mmWave transmitter element and (b) a high-speed digital interface. In order to alleviate input power requirements at mmWave, an off-chip 30GHz signal is fed to a frequency doubler to generate an on-chip 60GHz LO. The single-ended LO is converted to quadrature differential signals (by



Figure 4.7: Simplified architecture of the stacked hybrid power DAC-based digital polar phased array mmWave transmitter proposed in this work along with schematics of the transmitter element (bottom) and the high-speed digital interface (right).

means of a quadrature hybrid) to drive a high-speed Cartesian vector modulator which imposes phase information on the mmWave signal. The (single-ended) output of the phase modulator is then amplified by a array driver and distributed to four transmitter elements. Each transmitter element converts its single-ended input to quadrature differential ones (using a quadrature hybrid) which then drive a Cartesian phase shifter that can set the phase of the corresponding transmitter element to the appropriate value for phased array operation. In order to suppress output amplitude variations as well as reduce power consumption of the phase shifter, its (differential) outputs are fed to a limiting amplifier which in turn drives the hybrid 8-bit stacked power DAC described in Section 3.5.4. A transformer at the output of the power DAC (not shown in Fig. 4.7) performs differential-to-single-ended conversion to drive a  $50\Omega$  load.

The prototype has twenty-four digital control bits: (a) eight amplitude bits (common to the DACs of the four transmitter elements) and (b) eight bits for each of the in-phase (I) and quadraturephase (Q) paths of the phase modulator. The control bits are provided by a high-speed digital interface implemented using Current Mode Logic (CML). The digital interface accepts three serial data inputs from Arbitrary Waveform Generator (AWG), along with a serial clock input and deserializes each data input by a factor of eight to generate the individual control bits. The DC biases for the gates of the 4-stack DAC in each transmitter element are generated using local supplyadaptive biasing circuits in order to minimize switching delays. The amplitude control bits need to be symmetrically routed to all four transmitter elements. The phase control bits, in contrast, have to be routed only to the phase modulator. Thus, the control signals need to be appropriately buffered to minimize the skew in the amplitude and phase control paths. The following sections describe design considerations and implementation details of the individual building blocks of the transmitter prototype. All transformers and spirals used in the digital polar transmitter were simulated in EMX (provided by Integrand Software) which also provides a broadband lumped element model suitable for transient simulations. The implementation details of the different components of the proposed phased array digital polar transmitter are discussed in the following sections.

#### 4.4.1 Digital Polar Transmitter Element



Figure 4.8: Schematic with detailed block-level interconnections of a transmitter element in the proposed digital polar phased array transmitter with major building blocks highlighted.

A schematic with detailed block-level interconnections of the transmitter element is shown in Fig. 4.8 while Fig. 4.9 depicts the layout of the same with the major building blocks highlighted.

#### 4.4.1.1 Hybrid 8-bit Power DAC

The hybrid 8-bit power DAC proposed in Section 3.5.4 consists of two building blocks: 1) a 4-stacked DAC implemented as a differential 4-stacked PA with 7-bit binary-weighted source degeneration

transistor bank and 1-bit supply modulation and 2) a differential 2-stacked driver PA to provide sufficient input drive to the 4-stacked DAC. The devices in the 4-stacked PA were chosen for a



Figure 4.9: Layout snapshot of a transmitter element in the proposed digital polar phased array transmitter with major building blocks highlighted.

differential output power of  $\approx$ 325mW i.e. 25dBm (pre-layout) while the 2-stacked driver PA was

designed for a  $P_{out,sat}$  3dB higher than the input power required for the 4-stacked DAC. This ensures sufficient design margin at the cost of reduced efficiency for the 2-stacked PA since it needs to be operated at a backed-off power level. Since the power consumption and output power of the 4stacked DAC dominate, this translates to a negligible penalty in overall efficiency of the transmitter element and the array. Both PAs were designed based on the loss-aware design methodology for stacked Class-E PAs [14] discussed in Chapter 2. A transformer is used to transform a single-ended load at the output pad to a differential 100 $\Omega$  load for the 4-stacked PA. An interstage matching network transforms the input impedance of the 4-stacked PA to a differential 100 $\Omega$  load for the 2-stacked driver. A differential dc-feed spiral inductor in conjunction with an interstage matching network transforms the input impedance of the 4-stacked PA to the optimal load impedance for the 2-stacked driver. The simulated large signal performance metrics of the differential 4-stacked PA and the differential 2-stacked PA are shown in Fig. 4.10(a) and (b) respectively.



Figure 4.10: Simulated large signal performance metrics at 60 GHz in IBM 45nm SOI CMOS of the (a) differential 4-stacked PA, (b) differential 2-stacked driver PA and (c) limiting amplifier of a transmitter element in the proposed digital polar 4-element phased array transmitter prototype.

As discussed before, 1-bit supply modulation for the DAC is implemented by means of an inverter toggling between the nominal 4.8V and 2.4 supplies of the 4-stacked PA and the 2-stacked driver respectively. Since standard CMOS logic in IBM 45nm SOI CMOS permits a maximum supply voltage of 1.2V, level-shifters are required to convert a logic level ranging between 0 and 1.2V to one toggling between 2.4V and 4.8V. Additionally, the level-shifters must support fast switching speeds to facilitate high-speed amplitude modulation. The level-shifter used in the supply modulation consists of a cascade of a low-voltage level-shifter [103] followed by a high-voltage (HV)

one [103, 104]. The output of the HV level-shifter drives a set of progressively sized-up inverters (toggling between 2.4V and 4.8V) to drive the capacitive load presented by the supply modulation inverter. The tail DAC employs a common centroid layout to ensure good matching and linearity. Furthermore, each binary-weighted segment has an inverter driver with progressive device size scaling to ensure that the retiming flip-flops driving the different control bits experience the same capacitive loading [78].

#### 4.4.1.2 Limiting Amplifier

A limiting amplifier precedes the 2-stacked driver PA in order to further alleviate input drive requirements of the transmitter chain. Additionally, the limiting amplifier is operated close to saturation (unlike the 2-stacked driver PA) so that it irons out any residual amplitude variations of the foregoing block (phase shifter). The limiting amplifier is implemented as a 2-stacked Class-E-like PA, operating off a 1.2V supply voltage. The device size and load impedance for the limiting amplifier are determined based on the input power requirements for the 2-stacked driver. A differential dc-feed spiral inductor in conjunction with an interstage matching network transforms the input impedance of the 2-stacked driver to the optimal load impedance for the limiting amplifier. The simulated large signal performance metrics of the differential limiting amplifier are summarized in Fig. 4.10(c).

#### 4.4.1.3 Phase Shifter

Each transmitter element incorporates a phase shifter to facilitate beam-steering in a 4-element phased array. The phase shifter is implemented as a vector (I/Q) interpolator based on a doublebalanced Gilbert cell topology [105,106] with 7-bit binary-weighted tail transistor bank. The digital bits control the bias currents in the I and Q paths, thereby setting the phase shift based on the ratio of the bias currents in the mixing quads. The control bits do not toggle the tail devices ON and OFF directly, rather they connect or disconnect the gates of the switches from a bias voltage (obtained from a diode-connected NMOS device in a reference current generator). This enables control of the bias voltage and hence the output power of the phase shifter to ensure that the limiting amplifier can be driven into saturation with sufficient design margin. An 8<sup>th</sup> bit (MSB) commutes the bias current (of the I/Q path) between the two halves of the mixing quad, thereby providing phase inversion to facilitate  $180^{0}$  phase control for I and Q independently, and hence  $360^{0}$  phase control for the phase shifter. Fig. 4.11(a) shows a schematic of the phase shifter implemented in this work, while the phase shift error and output power variation for different phase shift settings are summarized in Fig. 4.11(b) and (c) respectively. The series capacitor and shunt dc-feed indcutor (implemented using CPW transmission lines) transform the input impedance of the limiting amplifier to the optimal load impedance of the phase shifter. A maximum phase error  $\approx 5^{0}$  is obtained across all phase shift settings along with a maximum output power variation  $\approx 1$ dB. As discussed in the previous section, the limiting amplifier following the phase modulator will iron out this power variation to provide a pure phase modulated mmWave input to the hybrid power DAC.



Figure 4.11: (a) Schematic of the vector (I/Q) interpolator used in the phase shifter of a transmitter element in the proposed digital polar 4-element phased array transmitter prototype. Simulation results at 60 GHz in IBM 45nm SOI CMOS for (b) maximum phase shift error and (c) corresponding variation in output power of the phase shifter for different phase shift settings.

#### 4.4.1.4 Quadrature Hybrid

The quadrature hybrid is essentially a 3dB coupler with two broadside coupled microstrip lines as described in [107]. High even mode impedance was achieved using slow-wave technique by opening up slots of appropriate length and width in the ground plane. Low odd mode impedance confines the signal between microstrips. The in-phase and quadrature-phase outputs of the hybrid are fed to a pair of baluns that provide single-ended-to-differential conversion. The baluns, along with the series capacitors transform the input impedance of the phase shifter to a 50 $\Omega$  matched load at the outputs of the hybrid.

#### 4.4.2 Simulation Results of Transmitter Element

The performance of the overall transmitter element for different phase shift settings is summarized in Fig. 4.12(a) and (b).



Figure 4.12: Simulation results at 60 GHz in IBM 45nm SOI CMOS for (a) output power, drain efficiency and PAE and (b) maximum phase shift error of the transmitter element in the proposed digital polar 4-element phased array transmitter prototype for different phase shift settings.

#### 4.4.3 Phased Array Transmitter Common Path

This section describes the components that constitute the common path in the phased array transmitter. The elements in the common path accept an off-chip mmWave signal, perform high-speed phase modulation, and distribute the resulting signal to all the transmitter elements after appropriate amplification.

#### 4.4.3.1 Frequency Doubler

In order to alleviate input power requirements at 60GHz, a frequency doubler is used at the input of the phased array transmitter which accepts a single-ended 30GHz signal from an off-chip source and performs frequency multiplication to generate an on-chip 60GHz signal for the transmitter. The frequency doubler is implemented as a differential common-source amplifier with the drain nodes connected together to extract the second harmonic of the input 30GHz signal. A transformer-based matching network is used to match the differential input impedance of the doubler to single-ended  $50\Omega$  at the input pad.

#### 4.4.3.2 High-speed Phase Modulator

The design of the phase modulator is identical to that of the vector interpolator-based phase shifter used in the transmitter elements, except that the device sizes here are larger in order to meet the input drive requirements of an array driver PA following the phase modulator. Additionally, the phase modulator tail DACs (I and Q) have progressively scaled inverter drivers driven by retiming flip-flops, similar to the tail DAC of the 4-stacked PA. A balun at the output of the phase modulator performs differential-to-single-ended conversion to drive the array driver PA.

#### 4.4.3.3 2-stacked Array Driver PA

The array driver is a 2-stacked Class-E-like PA operating off a 2.4V supply voltage. It accepts the phase modulated mmWave signal from the previous stage, amplifies it and distributes it to the four transmitter elements. The device size of the array driver PA is same as that of the 2-stacked driver PA used in the transmitter element. The input of the transmitter element is matched to  $50\Omega$  and it is convenient to use  $50\Omega$  CPW transmission lines to distribute power from the output of the array driver PA to the inputs of the transmitter elements. However, this means that the array

driver PA has to drive a load impedance of  $50/4=12.5\Omega$ . Since the power devices in the array driver are optimized for a  $50\Omega$  load, an impedance transformation load is used to perform the required impedance transformation. The input power to the array driver is then backed off till it generates the output power required to drive the transmitter elements.

#### 4.4.4 Delay Mismatch between Amplitude and Phase Paths

An important consideration in a digital polar architecture is matching the delay between the amplitude and phase paths in order to minimize spectral regrowth and EVM degradation. If unfiltered baseband data symbols are used, then the maximum delay mismatch between amplitude and phase paths that can be tolerated is half a symbol period (assuming the symbols are sampled at the middle of the period). However, baseband pulse shaping can reduce the delay mismatch tolerance substantially. For example, low-pass digital FIR filtering of baseband symbols with a cutoff frequency of 150MHz and an oversampling factor of eight results in a maximum allowable delay mismatch of  $\approx$ 10ps while satisfying EVM and spectral mask requirements (Figs. 4.13(a) and (b) respectively) for a baseband symbol-rate of 1Gsps.

Figs. 4.14 and 4.15 depict the settling behavior of the transmitter in response to changes only in amplitude and phase bits respectively using square-wave control signals switching at 1GHz. The worst-case delay between amplitude and phase settling times is  $\approx$ 85ps. Based on these simulations, the maximum symbol rate is estimated to be  $\approx$ 5Gsps.

#### 4.4.5 High-speed Digital Interface

The data-rate that can be achieved in a power DAC depends on both the settling time of the DAC (in response to a change in one or more control bits) as well as the speed of operation of the digital circuitry that provides the control signals. Based on the simulation results presented in Section 4.4.3.2, the control bits can toggle at a maximum speed of  $\approx$ 5Gsps. However, if the source providing the control bits cannot operate at this rate, then it will form the bottleneck in the operating speed of the DAC, rather than the mmWave path. For testing purposes, we can adopt one of two approaches:

1. Acquire the high-speed control signals from an off-chip programmable source (such as a commercial AWG or a PRBS generator) or



Figure 4.13: Output PSD of a 60GHz digital polar transmitter vs. delay mismatch between amplitude and phase paths for 1Gsps 64QAM baseband symbols. The baseband symbols are shaped using a tenth order low-pass digital FIR filter with a cutoff frequency of 150MHz and an oversampling factor of eight. Amplitude and phase resolution are eight bits each. The maximum delay mismatch that satisfies the spectral mask is  $\approx$ 10ps.



Figure 4.14: Settling behavior of the 60GHz digital polar transmitter in response to supply switching using a square-wave control input (with 50% duty-cycle) at 1GHz. The settling time is  $\approx$ 50ps.

2. Generate the high-speed control signals on-chip using either PRBS generators or programmable memory.

The former is feasible if only a few control bits are needed (i.e. for testing low resolution DACs as in [5]) and even then, high-speed analog front-end would be required to interface the off-chip signals with on-chip circuitry. On-chip PRBS generators can overcome this bottleneck, but lack of programmability makes them suitable for testing DACs that are inherently linear (or have internal compensation for non-linearity as in [101]) and do not require DPD. On-chip memories provide more flexibility, since they can be programmed for DPD while facilitating large number of control bits. However, the memory capacity should be sufficiently large to store enough instances of a constellation symbol. For higher order modulations, the resulting memory size can be large enough that high-speed operation ( $\approx$  several Gsps) with large number of control bits becomes impossible.

In this work, we adopt an approach utilized in high-speed wireline communication. A highspeed serial input is acquired from an off-chip source and is deserialized on-chip to generate the required control bits. To avoid the use of an on-chip Clock and Data Recovery (CDR) circuit, a serial clock input is also utilized. Since the proposed transmitter has twenty-four control bits

CHAPTER 4. A MILLIMETER-WAVE DIGITAL POLAR PHASED ARRAY TRANSMITTER



Figure 4.15: Settling behavior of the 60GHz digital polar transmitter in response to phase modulation using a square-wave control inputs (with 50% duty-cycle) at 1GHz. The phase modulator switches between phase settings of  $11^0$  and  $45^0$ . The settling time is  $\approx 120$  ps.

(eight each for amplitude, I-phase and Q-phase control paths), three serial inputs are required which are then deserialized by a factor of eight. It should be noted that for each control bit (of, say the amplitude control path) to operate at  $\approx$ 5Gsps, the corresponding serial input should support eight times higher data-rate i,e  $\approx$ 40Gsps. While technology scaling coupled with recent advancements in wireline communication certainly facilitate serial links operating at such speeds ( [108–110]), they typically require inductive tuning techniques for wideband operation along with complex equalization schemes depending on the channel characteristics. These add to area and power overhead, and is impractical in the context of the present work since there are four serial inputs (including clock). Fortunately, nanoscale CMOS has been shown to facilitate mixed-signal circuit operation  $\geq$ 20Gsps without using inductive peaking [111–113]. Thus, we aim for a CMLbased serial digital interface that can operate at  $\approx$ 20Gsps, which when deserialized by a factor of eight would translate to a symbol rate of  $\approx$ 2.5Gsps. The architecture of the high-speed digital interface is shown in Fig. 4.7 (right). The three serial data lanes consist of an Analog Front End (AFE), which is essentially a Variable Gain Amplifier (VGA) followed by a Continuous Time Linear Equalizer (CTLE), driving a 1:8 deserializer (DeMUX) which generates the individual control bits. The control signals for each path are buffered using standard CMOS logic and fed to retiming flipflops located at the digital inputs of the 4-stacked power DACs and the phase modulator. Each path is retimed using its local clock signal (available at the output of the corresponding DeMUX) in order to minimize routing complexity and clock skew. The serial clock path consists of an AFE followed by a buffer that provides the clock signal to the three DeMUX blocks. The following sections provide implementation details of the various components in the digital interface (all implemented using 1.2V supply voltage).



Figure 4.16: Schematic of the VGA used in the AFE of the high-speed digital interface for the proposed 60GHz digital polar 4-element phased array transmitter.

#### 4.4.5.1 Variable Gain Amplifier (VGA)

The first stage of the AFE consists of a VGA, implemented using a differential ac-coupled commongate amplifier (Fig. 4.16), that provides level conversion for the CTLE as well as gain [114, 115]. This topology was favored over the one proposed in [108] since the gain peaking in the latter for



Figure 4.17: Simulated small signal voltage gain and return loss  $(S_{11})$  for (a) maximum gain setting and (b) minimum gain setting of the VGA used in the AFE of the high-speed digital interface for the proposed 60GHz digital polar 4-element phased array transmitter.

minimum VGA gain was found to be around 8dB over the frequency of interest (dc to 20GHz). The common-mode voltage of the VGA input is generated by a replica source follower, which is sized same as the main stage of the VGA. The VGA has  $\approx 9.5$ dB gain control performed by two variable resistors ( $R_{series}$  and  $R_{shunt}$ ). By exercising 2-bits of control in the value of  $R_{series}$  and 5-bits in that of  $R_{shunt}$ , the voltage-division ratio of the input signal can be adjusted resulting in VGA gain variation from -3.2dB to 6.3dB. Two differential 50 $\Omega$  resistors constitute the input termination network of the receiver. A common-mode feedback (CMFB) circuit (not shown in Fig. 4.16) sets the bias of the PMOS current sources. Fig. 4.17(a) shows the simulated small signal gain and return loss ( $S_{11}$ ) of the VGA with peak gain setting, while the corresponding simulations for minimum gain settings are depicted in Fig. 4.17(b). It can be seen that VGA achieves 3dB bandwidth  $\approx 30$ GHz (at peak gain setting) along with a wideband input match ( $S_{11} < 15$ dB).

#### 4.4.5.2 Continuous Time Linear Equalizer (CTLE)

The VGA drives the second-stage of the AFE in the digital interface: a peaking amplifier that can provide additional fixed gain and can also be used to introduce a programmable amount of low-frequency de-emphasis. The low-frequency de-emphasis (or high frequency peaking) effect is realized through reduction in the low-frequency gain relative to the high-frequency gain. Since the circuit utilizes continuous-time analog blocks that are easily characterized with frequency-domain models, it is also referred to as a Continuous Time Linear Equalizer or CTLE. Wireline transceivers communicating over a backplane channel with high *frequency-dependent* loss (> 15dB) typically employ feedforward equalization (FFE) at the transmitter and decision feedback equalization (DFE) at the receiver. A combination of transmit and receive-side equalization helps minimize Inter Symbol Interference (ISI) by compensating for the channel characteristics and maintains optimum link performance with low Bit Error Rate (BER). However, the transmitter prototype discussed in this work is expected to operate in a well-controlled environment with low-loss traces on a Rogers PCB suitable for high frequency operation. Thus, a CTLE at the receiver (i.e. in the on-chip digital interface) is sufficient to compensate such a benign channel [115].



Figure 4.18: Schematic of the CTLE used in the AFE of the high-speed digital interface for the proposed 60GHz digital polar 4-element phased array transmitter.

A simplified diagram of the CTLE is shown in Fig. 4.18. It uses a parallel amplifier topology, where both amplifiers have the same input, which is the output of the VGA. The overall CTLE response is a combination of the responses of these two amplifiers as described in [114, 116]. The peaking level is adjusted by controlling the ratio of the bias currents to two amplifiers, one with a



Figure 4.19: Simulated small signal voltage gain for (a) no gain peaking, (b) maximum gain peaking and (c) programmable gain peaking of the CTLE used in the AFE of the high-speed digital interface for the proposed 60GHz digital polar 4-element phased array transmitter.

fixed 10-dB peaking (peaking amplifier) and one with no peaking (auxiliary amplifier). The peaking amplifier is implemented as a digitally adjustable source-degenerated differential amplifier with RC source degeneration. As the degeneration impedance drops with increasing frequency, the amplifier gain increases proportionately until the bandwidth limitation of the amplifier causes the gain to roll off. This forms a peak in the gain/frequency characteristic. The frequency at which the peak occurs (also referred to as the pole position) is adjusted by varying the value of the 5-bit binary-weighted degeneration capacitor bank. Applying no bias current to the peaking amplifier and full bias to the auxiliary amplifier produces no peaking (Fig. 4.19(a)). Conversely full bias on the peaking amplifier and no bias on the auxiliary amplifier results in maximum peaking (Fig. 4.19(b)). Applying bias to both the peaking and auxiliary amplifiers (for a given source degeneration) allows control on the amount of peaking (Fig. 4.19(c)).

#### 4.4.5.3 1:8 Deserializer (DeMUX) and Digital Retiming

The output of the CTLE drives a 1:8 deserializer (DeMUX) to generate the individual control bits from the serial input. The static frequency dividers, master-slave flip-flops and clock buffers in the DeMUX were all implemented using CML cicuits ([111]). Each deserialized bit is fed to a differential amplifier with current mirror load to perform differential-to-single-ended conversion and make the output compatible with standard CMOS logic levels. As mentioned before, the I



Figure 4.20: Simulated output waveforms of the 1:8 DeMUX using a 20Gbps PRBS input. The input sequence has been offset for clarity.

and Q phase control bits need to be routed to the phase modulator while the amplitude control bits need to be routed symmetrically to all four transmitter elements. Additionally, all control bits (amplitude and phase) should have the same delay from the output of the respective DeMUXes to the retiming flip-flops at the respective control inputs of the DAC/phase modulator. This was accomplished by inserting buffers every  $200\mu$ m on the routing lines and equalizing the lengths of routing lines for the phase and amplitude control paths. A divided version of the serial input clock is available at the output of each 1:8 DeMUX, which is routed along with the corresponding control bits for retiming the data using standard cell flip-flops located in proximity of the DAC and phase modulator digital control inputs. This ensures that there is minimum skew between the retimed data while also avoiding complex and power-hungry distribution of a single clock all over the chip. The simulated output of the deserializer for an input 20Gbps PRBS sequence is shown in Fig. 4.20.
#### 4.4.6 Simulation Results of Transmitter Prototype

Fig. 4.21 depicts a layout snapshot of the digital polar 4-element phased array transmitter prototype while Table 4.1 compares the simulated performance of the array (under continuous-wave operation) with state-of-the-art mmWave PAs with  $P_{sat} > 20$ dBm or employing efficiency enhancing architectures.



Figure 4.21: Layout snapshot of the proposed digital polar phased array transmitter with major building blocks highlighted.

| Ref.      | Technology   | Freq. | $\mathbf{V}_{\mathrm{DD}}$ | $\mathbf{P}_{\mathrm{out,sat}}$ | η    | $\frac{\eta_{-6dB}}{\eta_{max}}$ | PAE <sub>max</sub> | $\frac{PAE_{-6dB}}{PAE_{max}}$ | Gain <sub>max</sub> | $\Delta f/f_0$ | ITRS             | Area               | Architecture           | Fully               | Integration    |
|-----------|--------------|-------|----------------------------|---------------------------------|------|----------------------------------|--------------------|--------------------------------|---------------------|----------------|------------------|--------------------|------------------------|---------------------|----------------|
|           |              | (GHz) | (V)                        | (dBm)                           | (%)  | (%)                              | (%)                | (%)                            | (dB)                | (%)            | FoM <sup>4</sup> | (mm <sup>2</sup> ) |                        | Integrated?         | Complexity     |
| This work | 45nm         | 60    | 4.8                        | 29.6                            | 21.6 | 14                               | 21.5               | 13.5                           | 19.7                | N/A            | 78.19            | 6.5                | Digital polar          | Yes                 | Direct digital |
|           | SOI CMOS     |       |                            |                                 |      |                                  |                    |                                |                     |                |                  |                    | 4-element phased array |                     | to mmWave      |
| [14]      | 45nm         | 42.5  | 2.6                        | 23.4                            | 8.2  | 70.7                             | 6.7                | 67.7                           | 15                  | 32             | 58.8             | 4.16               | 3-bit mmWave           | Yes                 | Direct digital |
|           | SOI CMOS     |       |                            |                                 |      |                                  |                    |                                |                     |                |                  |                    | Power DAC              |                     | to mmWave      |
| [85]      | 40nm         | 80    | 0.9                        | 20.9                            | N/R  | N/R                              | 22.3               | 32.7                           | 18.1                | 19.5           | 70.5             | 0.19               | 4-way diff.            | Yes                 |                |
|           | CMOS         |       |                            |                                 |      |                                  |                    |                                |                     |                |                  |                    | power combined         |                     | PA only        |
| [4]       | 45nm         | 41    | 5                          | 21.6                            | N/R  | N/R                              | 25.1               | 60                             | 8.9                 | 22             | 56.8             | 0.3                | 4-stacked PA           | Yes                 |                |
|           | SOI CMOS     |       |                            |                                 |      |                                  |                    |                                |                     |                |                  |                    |                        |                     | PA only        |
| [86]      | 40nm         | 60    | 1.2                        | 22.6                            | N/R  | N/R                              | 7                  | 35.7                           | 29                  | 11.6           | 75.6             | 2.16               | 8-way diff.            | Yes                 |                |
|           | CMOS         |       |                            |                                 |      |                                  |                    |                                |                     |                |                  |                    | power combined         |                     | PA only        |
| [56]      | 45nm         | 45    | 5.5                        | 28                              | N/R  | N/R                              | 14                 | 35.7                           | N/R                 | N/R            | N/A              | 11.25              | 4-way diff.            | No <sup>7</sup>     |                |
|           | SOI CMOS     |       |                            |                                 |      |                                  |                    |                                |                     |                |                  |                    | spatial power comb.    |                     | PA only        |
| [57]      | 40nm         | 60    | 1                          | 15.6                            | 236  | $28.5^{6}$                       | N/A                | N/A                            | N/A                 | 11.6           | N/A              | 0.33               | Out-phasing            | Yes                 | modulator      |
|           | CMOS         |       |                            |                                 |      |                                  |                    |                                |                     |                |                  |                    |                        |                     | + PA           |
| [73]      | 45nm         | 42    | 2.5                        | 18                              | 33   | 72.7                             | 23                 | 73.9                           | 7                   | $N/R^1$        | 51.1             | 0.64               | Doherty                | Yes                 | PA only        |
|           | SOI CMOS     |       |                            |                                 |      |                                  |                    |                                |                     |                |                  |                    | 2-stacked PAs          |                     |                |
| [80]      | 45nm         | 45    | 4                          | 21.3                            | 24   | 52.1                             | 16                 | 56.2                           | 7.4                 | 13.3           | 53.8             | 1.15               | 4-stacked 8 bit        | Yes                 | modulator      |
|           | SOI CMOS     |       |                            |                                 |      |                                  |                    |                                |                     |                |                  |                    | I/Q Power DAC          |                     | + PA           |
| [5]       | 45nm         | 45    | 5.1                        | 24.3                            | 21.3 | N/R                              | 14.6               | N/R                            | >18                 | 15             | 67               | 7.67               | 4-stacked              | $\mathrm{No}^{3,5}$ | Direct digital |
|           | SOI CMOS     |       |                            |                                 |      |                                  |                    |                                |                     |                |                  |                    | 2 bit Power DAC        |                     | to mmWave      |
| [87]      | $0.13 \mu m$ | 46    | 5                          | 21.8                            | 21   | 60                               | 18.5               | 59.5                           | 15                  | N/R            | 62.73            | 1.6                | 2-stacked              | Yes                 | Direct digital |
|           | SiGe BiCMOS  |       |                            |                                 |      |                                  |                    |                                |                     |                |                  |                    | 1 bit Power DAC        |                     | to mmWave      |
| [88]      | $0.13 \mu m$ | 46    | 5                          | 28.9                            | 20   | 60                               | 18.5               | 59.5                           | 15                  | N/R            | 69.83            | 13.65              | 3-bit mmWave           | Yes                 | Direct digital |
|           | SiGe BiCMOS  |       |                            |                                 |      |                                  |                    |                                |                     |                |                  |                    | Power DAC              |                     | to mmWave      |

Table 4.1: Comparison with State-of-the-Art mmWave PAs with  $P_{sat} > 20$ dBm or employing efficiency enhancing architectures

<sup>1</sup>Large-signal performance across frequency is not reported. <sup>2</sup>Measurement below 33GHz is limited by equipment. <sup>3</sup>Assumes 3dB external differential-to-single-ended converter. <sup>4</sup>Defined as  $P_{sat}(dBm)$ + Gain(dB) + 20 log<sub>10</sub>(Freq.(GHz)) + 10 log<sub>10</sub>(PAE). <sup>5</sup> Does not have an on-chip choke inductor (biased using external bias-Ts). <sup>6</sup>Entire system metrics (for a fair comparison) inferred from supplied graph assuming power consumption of I/Q mixer remains constant. <sup>7</sup> Uses an on-PCB input matching network.

#### 4.5 Conclusion

This chapter presented implementation details of a 60GHz digital polar four element phased array transmitter with watt-level output power in IBM 45nm SOI CMOS based on a hybrid 8-bit power DAC described in Chapter 3. A deserializing receiver was incorporated on-chip to facilitate acquisition of high-speed control signals for the high-resolution DAC. The prototype has been sent for fabrication and experimental results are expected to be available in the latter half of 2016. To the best of our knowledge, this work demonstrates the first fully integrated high-power mmWave transmitter capable of supporting complex modulations with high average efficiency.

### Chapter 5

## **Future Work**

In this dissertation, the concepts, design and implementation of high-power, high-resolution, digitalintensive mmWave power DACs in 45nm SOI CMOS process were discussed. This chapter presents future directions for the research described herein for next generation wireless networks.

#### 5.1 Short Term Goal: Prototype Measurements

Preliminary simulation results of the proposed digital polar power DAC-based transmitter prototype were discussed in Chapter 4. The final experimental results demonstrating functionality of the prototype are expected to be completed in the latter half of 2016. Since the use of Class-G supply switching in conjunction with tail transistor modulation results in a non-linear amplitude profile for the DAC (Fig. 3.38(c)), an important aspect of experimental verification would be the use of DPD to facilitate linear operation while supporting complex modulations.

## 5.2 Long Term Vision: Massive Millimeter-wave MIMO for Wireless Backhaul

The escalation of wireless communication in recent years necessitates high-capacity links for backbone communication networks to handle the aggravated data flow. The eighth broadband progress report from the FCC has concluded that "broadband is not being reasonably and timely deployed to all Americans" particularly for data rates exceeding 3Mb/s [117, 118]. The high initial investment and repair costs of deploying fiber-optics backbone infrastructure (Fig. 5.1(a)) limit the overall network capacity and deter expansion into rural/developing regions. A transformative *wireless* infrastructure capable of achieving optical data-rates can overcome backhaul resource constraints, leading to overall higher capacity for 4G, WiFi and WiMAX networks [119]. The rapid deployment capabilities of such a system is also pertinent in situations like natural disasters to enable portable high-speed wireless internet access [120]. These wireless links can also serve as the bridge between optical links across rugged terrains (mountains and rivers) where trenching is impractical, or in urban areas where installation costs are prohibitive.



Figure 5.1: (a) Current fiber-optic based infrastructure for mobile backhaul and (b) alternative MIMO-based approach for wireless backhaul using compact, massively scalable arrays at mmWave.

The capacity of wireless backbone links can be increased by increasing signal bandwidth. The wide available bandwidth at mmWave frequencies has led to the exploration of mmWave for backhaul. While mmWave (e.g. E-band or 71-76GHz/81-86GHz) backhaul links are commercially available, data rates are typically limited to <5Gbps due to transceiver limitations. In particular, high-speed analog-to-digital converters (ADCs) remain expensive and power-hungry, and ultimately limit the signal bandwidths that may be exploited. Such mmWave backhaul links also typically employ high-gain Cassegrain antennas to overcome the high propagation losses and atmospheric absorption that plague mmWaves, but consequently require careful alignment and are sensitive to wind gusts and building sway.

Data rate can also be increased through the use of higher-order spectrally-efficient modulations, but this once again requires increased SNR which stresses transceiver and ADC performance. Alternatively, a MIMO-based spatial multiplexing approach maybe employed to increase capacity. However, the spatial multiplexing factor has typically been limited in state-of-the-art MIMO links due to the associated signal processing complexity and the large footprint of antenna arrays (typically  $8\times8$  or  $12\times12$ ) at low frequencies [121]. Interestingly, the reduction in feature size at mmWave (due to reduced wavelength) enables massive antenna arrays within a compact footprint (Fig. 5.1(b)). Such massive arrays can use a mix of linear beamforming to overcome the high path loss, and massive MIMO to increase the data rate.



Figure 5.2: Proposed massive mmWave MIMO architecture using high-power, high-efficiency stacked mmWave power DAC transmitter together with novel digital signal processing schemes for channel estimation and low-PAPR transmitter precoding to facilitate high-speed links.

Millimeter-wave MIMO [122] is being considered as an enabling technology for the next generation of wireless standards (5G) [8]. Since high atmospheric and propagation losses at mmWave can degrade link performance, demonstrations of mmWave MIMO so far have relied on high gain antennas and have been restricted to short-range links [123, 124]. However, such an approach can limit scalability and robustness. Fig. 5.2 illustrates our long term research vision culminating in a scalable, precoding-based massive mmWave MIMO link which employs linear beamforming sub-arrays (MIMO elements) at both the transmitter and the receiver. It utilizes a *digital-transmitter-centric* mmWave MIMO architecture where MIMO precoding is pursued on the transmitter side to simplify the receiver. Simple QPSK modulation is envisioned to relax MIMO receiver dynamic range requirements (enabling, in the limit, 1-bit ADCs [125]) but recover aggregate data rate through massive MIMO spatial multiplexing. This simplification of the receiver architecture necessitates precoding at the transmitter to ensure that a clean QPSK signal arrives at each receiver. The entire signal conditioning complexity is therefore transferred to the transmitter, which can be addressed by means of digital-intensive mmWave transmitters such as the one proposed in Chapter 4. MIMO precoding enhances the signal peak-to-average-power-ratio (PAPR) at the transmitter (5.2.1). Additionally, channel estimation using low-resolution ADCs is challenging. Therefore, efficient precoding algorithms for co-design with the transmitter (to minimize PAPR and maximize back-off efficiency) along with a novel level-triggered sampling-based channel estimation approach [126] is being investigated in collaboration with Prof. Xiaodong Wang at Columbia University.

#### 5.2.1 MmWave Massive MIMO Link Analysis



Figure 5.3: Survey across frequency of CMOS transmitter saturated output power (extrapolated from works discussed in this thesis and those reported in [4,5,28,127,128]) and NF of state-of-the-art CMOS receivers, and single-element path loss for 100m distance including atmospheric absorption are shown. The resultant number of MIMO elements and corresponding sub-array size, PAPR at the MIMO transmitter elements and data rate for our proposed massive MIMO system are also depicted.

It is instructive to perform a cross-layer analysis of the achievable data rate in a massive

mmWave MIMO link. A survey of state-of-the-art CMOS transceivers yields scaling trends for pertinent performance metrics (transmitter saturated output power  $P_{sat}$  and receiver NF) across frequency. A link range of 100m and a typical antenna gain of 5dBi are assumed yielding the path loss depicted in Fig. 5.3 including atmospheric absorption. The total aperture size is constrained to  $1m \times 1m$ , which is roughly the same size as the 3-foot Cassegrain antennas that are commonly used for current E-band point-to-point backhaul links. The required Rayleigh spacing (=  $\sqrt{R\lambda/N}$ where R is the 100m communication distance,  $\lambda$  is the wavelength and N is the number of MIMO elements along one dimension of the square array) for line-of-sight point-to-point MIMO dictates the total number of MIMO elements that can be supported within the aperture (equal for the transmitter and the receiver). To overcome the high path loss at mmWave, beamforming is required to achieve a sufficient SNR. Therefore as mentioned earlier, each MIMO element comprises a sub-array of elements across which (RF) beamforming is performed. A spacing of  $\lambda/2$  is assumed between sub-array elements. Each MIMO stream is a 1Gsymbol/s QPSK signal. A purely line-of-sight channel model is constructed for the system for simplicity, and conventional linear zero-forcing (ZF) precoding is assumed across MIMO elements on the transmitter. Assuming equal beamforming sub-array sizes at the transmitter and receiver, the required sub-array size to reach an SNR of 20dB at each receiver MIMO element is determined through system analysis and MATLAB simulations. The resultant PAPR arising from linear superposition of multiple QPSK streams at

each MIMO transmitter element due to precoding is also shown. Finally, the aggregate data rate is computed as the number of MIMO elements times 2Gbps. The aggregate data rate per unit power dissipation is also computed assuming:

- a frequency-dependent DC power consumption for each transmitting element, determined from fitting a second-order polynomial to PAE trends for stacked PAs reported in this work and those in [4,5,28,127,128] and
- a frequency-independent DC power consumption of 100mW for each receiving element respectively (based on the survey of the state of the art).

Several interesting observations can be made from this analysis:

1. As frequency increases, the number of MIMO elements that can be supported increases due to the decreasing Rayleigh distance, as does the aggregate data rate. At 60GHz and beyond, the MIMO starts to become massive  $(9 \times 9)$ , while at lower millimeter-wave frequencies (e.g. 28GHz) that are currently being explored, the MIMO order is smaller.

- 2. MIMO pre-coding increases the PAPR at each transmitter element, stressing power amplifier performance.
- 3. At higher frequencies, the higher path loss, lower transmitter output power, higher PAPR due to higher-order MIMO and degrading receiver performance necessitate increased beamforming to achieve sufficient SNR, dictating larger beamforming sub-array sizes. This increases the power consumption of the system, degrading data rate per unit power consumption.
- 4. The data rate per unit power consumption trend will dictate the choice of frequency for different applications - more power-constrained applications would benefit from lower mmWave frequencies such as 28/38GHz, while links requiring higher aggregate data rate with relaxed power constraints might use 60GHz and beyond.

#### 5.2.2 Impact of MIMO Array Geometry on Transceiver Metrics

The foregoing analysis assumed that the MIMO elements are arranged in a square array. However, the MIMO elements can be arranged in alternative geometries that might affect the performance metrics in different ways. Fig. 5.4 depicts the various transceiver metrics across frequency for a circular arrangement of MIMO elements. A comparison (Fig. 5.5) with the corresponding results for a square array indicates that for the same peak data-rate, a circular array reduces the PAPR variation across frequency albeit at the cost of energy-efficiency. This indicates that array geometry is an important degree of freedom that can be exploited to facilitate a trade-off between different performance metrics.

#### 5.2.3 Demonstration of a Long-haul MIMO Link at 60GHz

Based on these preliminary theoretical investigations, we propose a mmWave MIMO demonstration as shown in Fig. 5.6. A data-rate of 16Gbps (i.e. 2Gsps modulation bandwidth) is targeted over 100m at 60GHz using a  $4 \times 4$  MIMO system while satisfying the link constraints described previously. Each MIMO transmit element is realized using the hybrid power DAC-based digital polar phased array transmitter discussed in Chapter 4. Precoded QPSK data-streams generated



Figure 5.4: The number of MIMO elements and corresponding sub-array size, PAPR at the MIMO transmitter elements and data rate for the proposed massive MIMO system when the MIMO elements are arranged in a circular array.



Figure 5.5: Comparison of transceiver performance metrics for square and circular arrangement of MIMO elements in the proposed massive MIMO system.

using an Arbitrary Waveform Generator (AWG) will be applied to the MIMO elements along with pilot sequences to facilitate LO phase recovery at the (off-the-shelf) receiver. The demodulated data will be captured in an oscilloscope for further processing. Channel estimation and precoding will be performed using the setup prior to data-transmission. Table 5.1 compares the targeted performance of the proposed MIMO link with contemporary high-speed mmWave transceivers (academic as well as commercial). The proposed demonstration can facilitate unprecedented data-rates at link ranges not feasible using current technologies. Preliminary demonstration of such a long-haul MIMO link using the proposed digital polar transmitter can provide a pathway to realize MIMO systems for future wireless networks.



Figure 5.6: Proposed demonstration of a 16Gbps  $4 \times 4$  mmWave MIMO link at 60GHz over 100m distance. Each MIMO transmit element is realized using the hybrid power DAC-based digital polar four element phased array transmitter discussed in Chapter 4.

Table 5.1: Comparison of the proposed demonstration of a  $4 \times 4$  mmWave MIMO link with contemporary high-speed mmWave links (academic and commercial).

| Entity     | Freq.             | Link Range                    | Tx Pout  | Antenna Gain                                                   | Modulation  | Data-rate              | Architecture                     | Application        |  |
|------------|-------------------|-------------------------------|----------|----------------------------------------------------------------|-------------|------------------------|----------------------------------|--------------------|--|
| This Work  | 60 GHz            | 100m                          | 29.6dBm  | 5dBi                                                           | QPSK        | 16 Gbps                | 4×4 MIMO                         | Mobile Backhaul    |  |
| IBM        | 60 GHz            | 54cm                          | 9dBm     | 8dBi                                                           | 16QAM       | 5.3 Gbps               | 16-antenna Beamformer            | IEEE 802.15.3c     |  |
| IBM        | 60 GHz            | 10m                           | 15dBm    | 7dBi                                                           | QPSK-OFDM   | 0.64 Gbps              | Single Element TRx               | IEEE 802.15.3c     |  |
| IMEC       | 60 GHz            | $3.6\mathrm{m}/0.6\mathrm{m}$ | 10.8dBm  | 7.5dBi<br>(Array)                                              | QPSK/16 QAM | $3.5/7 \mathrm{~Gbps}$ | 4-antenna Beamformer             | IEEE802.11ad       |  |
| Tokyo Tech | $60~\mathrm{GHz}$ | $0.9\mathrm{m}/0.6\mathrm{m}$ | 10.3dBm  | 14dBi*                                                         | QPSK/16 QAM | $14/28~\mathrm{Gbps}$  | 4-channel Bonding*               | IEEE802.11ad/WiGig |  |
| UCSB       | 60 GHz            | 5m                            | N/A      | 24dBi*,#                                                       | BPSK        | 2.4 Gbps               | 4×4 MIMO <sup>∗,#</sup>          | N/R                |  |
| UCSB       | 60 GHz            | 4m                            | -16.5dBm | $\begin{array}{c} 24 dBi(Tx) \\ 40 dBi(Rx)^{*,\#} \end{array}$ | BPSK        | 1.2 Gbps               | $2 \times 2 \text{ MIMO}^{*,\#}$ | N/R                |  |
| NEC        | 70-80 GHz         | 1Km                           | N/A      | N/A*                                                           | QPSK-256QAM | 1.6 Gbps               | N/A*                             | Mobile Backhaul    |  |
| Siklu      | 81-86 GHz         | 500m                          | N/A      | 32dBi*                                                         | QPSK-64QAM  | 1 Gbps                 | N/A*                             | Mobile Backhaul    |  |

 $^{\ast}$  Uses high-gain parabolic/horn antennas.  $^{\#}$  Uses off-the shelf Tx and Rx components.

#### 5.3 Conclusion

This chapter presented an overview of the future directions for the research described in this thesis. Starting with experimental verification of the functionality of the digital polar phased array transmitter proposed in Chapter 4, we intend to demonstrate the utility of such a prototype in realizing high-speed long-haul mmWave links by means of a  $4 \times 4$  MIMO link at 60GHz. Aside from the mmWave front-end, implementation of a scalable digital backend that can generate, distribute and process high-speed signals with low power consumption and acceptable signal integrity presents a significant challenge. Furthermore, such a massive system entails innovative packaging and thermal cooling solutions. Our long term research vision pertaining to massive MIMO thus presents an exciting opportunity for interdisciplinary research that requires co-investigation of novel circuit and signal processing techniques, as well as exploration of digital intensive scalable arrays. A combination of cost-effective nanoscale CMOS along with the potential for exploring digital-intensive spatial processing techniques such as massive MIMO thus indicate an exciting future for research in mmWave systems.

Part I

Bibliography

# Bibliography

- A. Chakrabarti and H. Krishnaswamy, "An Improved Analysis and Design Methodology for RF Class-E Power Amplifiers with Finite DC-feed Inductance and Switch On-Resistance," in *Circuits and Systems (ISCAS), 2012 IEEE International Symposium on*, May 2012, pp. 1763–1766.
- [2] S. Shopov, A. Balteanu, and S. Voinigescu, "A 19 dBm, 15 Gbaud, 9 bit SOI CMOS Power-DAC Cell for High-Order QAM W-Band Transmitters," *Solid-State Circuits, IEEE Journal* of, vol. 49, no. 7, pp. 1653–1664, July 2014.
- [3] A. Chakrabarti and H. Krishnaswamy, "High-Power High-Efficiency Class-E-Like Stacked mmWave PAs in SOI and Bulk CMOS: Theory and Implementation," *Microwave Theory and Techniques, IEEE Transactions on*, vol. 62, no. 8, pp. 1686–1704, Aug 2014.
- [4] H. Dabag, B. Hanafi, F. Golcuk, A. Agah, J. Buckwalter, and P. Asbeck, "Analysis and Design of Stacked-FET Millimeter-Wave Power Amplifiers," *Microwave Theory and Techniques*, *IEEE Transactions on*, vol. 61, no. 4, pp. 1543–1556, April 2013.
- [5] A. Balteanu, I. Sarkas, E. Dacquay, A. Tomkins, and S. Voinigescu, "A 45-GHz, 2-bit Power DAC with 24.3 dBm Output Power, >14 Vpp Differential Swing, and 22% Peak PAE in 45-nm SOI CMOS," in *Radio Frequency Integrated Circuits Symposium (RFIC), 2012 IEEE*, June 2012, pp. 319–322.
- [6] A. Natarajan, "Millimeter-wave phased arrays in silicon," Ph.D. dissertation, California Institute of Technology, 2007. [Online]. Available: http://resolver.caltech.edu/CaltechETD: etd-06012007-130844

- H. Krishnaswamy, "Architectures and integrated circuits for RF and mm-wave multipleantenna systems on silicon," Ph.D. dissertation, University of Southern California, 2009. [Online]. Available: http://digitallibrary.usc.edu/cdm/ref/collection/p15799coll127/ id/231999
- T. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. Wong, J. Schulz, M. Samimi, and F. Gutierrez, "Millimeter Wave Mobile Communications for 5G Cellular: It Will Work!" Access, IEEE, vol. 1, pp. 335–349, May 2013.
- [9] A. Ezzeddine, H.-L. Hung, and H. Huang, "High-Voltage FET Amplifiers for Satellite and Phased-Array Applications," in *Microwave Symposium Digest*, 1985 IEEE MTT-S International, June 1985, pp. 336–339.
- [10] M. Shifrin, Y. Ayasli, and P. Katzin, "A new power amplifier topology with series biasing and power combining of transistors," in *Microwave and Millimeter-Wave Monolithic Circuits Symposium*, 1992. Digest of Papers, IEEE 1992, June 1992, pp. 39–41.
- [11] A. Ezzeddine and H. Huang, "The high voltage/high power FET (HiVP)," in Radio Frequency Integrated Circuits (RFIC) Symposium, 2003 IEEE, June 2003, pp. 215–218.
- [12] J. McRory, G. Rabjohn, and R. Johnston, "Transformer coupled stacked FET power amplifiers," *Solid-State Circuits, IEEE Journal of*, vol. 34, no. 2, pp. 157–161, Feb 1999.
- [13] J. Jeong, S. Pornpromlikit, P. Asbeck, and D. Kelly, "A 20 dBm Linear RF Power Amplifier Using Stacked Silicon-on-Sapphire MOSFETs," *Microwave and Wireless Components Letters*, *IEEE*, vol. 16, no. 12, pp. 684–686, Dec 2006.
- [14] R. Bhat, A. Chakrabarti, and H. Krishnaswamy, "Large-Scale Power Combining and Mixed-Signal Linearizing Architectures for Watt-Class mmWave CMOS Power Amplifiers," *Microwave Theory and Techniques, IEEE Transactions on*, vol. 63, no. 2, pp. 703–718, Feb 2015.
- [15] Y. Zhao, J. Long, and M. Spirito, "Compact transformer power combiners for millimeterwave wireless applications," in *Radio Frequency Integrated Circuits Symposium (RFIC)*, 2010 *IEEE*, May 2010, pp. 223–226.

- [16] J. wei Lai and A. Valdes-Garcia, "A 1V 17.9dBm 60GHz Power Amplifier In Standard 65nm CMOS," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International, Feb 2010, pp. 424–425.
- [17] W. Tai, L. Carley, and D. Ricketts, "A 0.7W Fully Integrated 42GHz Power Amplifier with 10% PAE in 0.13µm SiGe BiCMOS," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International, Feb 2013, pp. 142–143.
- [18] M. Bohsali and A. Niknejad, "Current combining 60GHz CMOS power amplifiers," in Radio Frequency Integrated Circuits Symposium, 2009. RFIC 2009. IEEE, June 2009, pp. 31–34.
- [19] B. Martineau, V. Knopik, A. Siligaris, F. Gianesello, and D. Belot, "A 53-to-68GHz 18dBm power amplifier with an 8-way combiner in standard 65nm CMOS," in 2010 IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, Feb 2010, pp. 428–429.
- [20] T. Dickson, K. H. K. Yau, T. Chalvatzis, A. Mangan, E. Laskin, R. Beerkens, P. Westergaard, M. Tazlauanu, M.-T. Yang, and S. Voinigescu, "The Invariance of Characteristic Current Densities in Nanoscale MOSFETs and Its Impact on Algorithmic Design Methodologies and Design Porting of Si(Ge) (Bi)CMOS High-Speed Building Blocks," *Solid-State Circuits, IEEE Journal of*, vol. 41, no. 8, pp. 1830–1845, Aug 2006.
- [21] A. Niknejad, S. Emami, B. Heydari, M. Bohsali, and E. Adabi, "Nanoscale CMOS for mm-Wave Applications," in *Compound Semiconductor Integrated Circuit Symposium*, 2007. CSIC 2007. IEEE, Oct 2007, pp. 1–4.
- [22] B. Heydari, M. Bohsali, E. Adabi, and A. Niknejad, "Millimeter-Wave Devices and Circuit Blocks up to 104 GHz in 90 nm CMOS," *Solid-State Circuits, IEEE Journal of*, vol. 42, no. 12, pp. 2893–2903, Dec 2007.
- [23] S. Nicolson, A. Tomkins, K. Tang, A. Cathelin, D. Belot, and S. Voinigescu, "A 1.2V, 140GHz receiver with on-die antenna in 65nm CMOS," in *Radio Frequency Integrated Circuits Symposium*, 2008. *RFIC 2008. IEEE*, June 2008, pp. 229–232.

- [24] E. Johnson, "Physical limitations on frequency and power parameters of transistors," in IRE International Convention Record, vol. 13, March 1965, pp. 27–34.
- [25] I. Sarkas, A. Balteanu, E. Dacquay, A. Tomkins, and S. Voinigescu, "A 45nm SOI CMOS Class-D mm-Wave PA with >10Vpp Differential Swing," in *Solid-State Circuits Conference* Digest of Technical Papers (ISSCC), 2012 IEEE International, Feb 2012, pp. 88–90.
- [26] A. Mazzanti, L. Larcher, R. Brama, and F. Svelto, "Analysis of reliability and power efficiency in cascode class-E PAs," *Solid-State Circuits, IEEE Journal of*, vol. 41, no. 5, pp. 1222–1229, May 2006.
- [27] A. Ezzeddine and H. Huang, "The high voltage/high power FET (HiVP)," in Radio Frequency Integrated Circuits (RFIC) Symposium, 2003 IEEE, June 2003, pp. 215–218.
- [28] S. Pornpromlikit, J. Jeong, C. Presti, A. Scuderi, and P. Asbeck, "A 33-dBm 1.9-GHz siliconon-insulator CMOS stacked-FET power amplifier," in *Microwave Symposium Digest*, 2009. *MTT '09. IEEE MTT-S International*, June 2009, pp. 533–536.
- [29] J. McRory, G. Rabjohn, and R. Johnston, "Transformer coupled stacked FET power amplifiers," *Solid-State Circuits*, *IEEE Journal of*, vol. 34, no. 2, pp. 157–161, Feb 1999.
- [30] S. Pornpromlikit, H.-T. Dabag, B. Hanafi, J. Kim, L. Larson, J. Buckwalter, and P. Asbeck, "A Q-Band Amplifier Implemented with Stacked 45-nm CMOS FETs," in *Compound Semiconductor Integrated Circuit Symposium (CSICS)*, 2011 IEEE, Oct 2011, pp. 1–4.
- [31] A. Agah, H. Dabag, B. Hanafi, P. Asbeck, L. Larson, and J. Buckwalter, "A 34% PAE, 18.6dBm 42-45GHz Stacked Power Amplifier in 45nm SOI CMOS," in *Radio Frequency In*tegrated Circuits Symposium (RFIC), 2012 IEEE, June 2012, pp. 57–60.
- [32] A. Chakrabarti, J. Sharma, and H. Krishnaswamy, "Dual-Output Stacked Class-EE Power Amplifiers in 45nm SOI CMOS for Q-Band Applications," in *Compound Semiconductor In*tegrated Circuit Symposium (CSICS), 2012 IEEE, Oct 2012, pp. 1–4.
- [33] A. Chakrabarti and H. Krishnaswamy, "Design considerations for stacked Class-E-like mmWave high-speed power DACs in CMOS," in *Microwave Symposium Digest (IMS)*, 2013 *IEEE MTT-S International*, June 2013, pp. 1–4.

- [34] N. Sokal and A. Sokal, "Class E-A new class of high-efficiency tuned single-ended switching power amplifiers," *Solid-State Circuits, IEEE Journal of*, vol. 10, no. 3, pp. 168–176, Jun 1975.
- [35] A. Chakrabarti and H. Krishnaswamy, "High power, high efficiency stacked mmWave Class-Elike power amplifiers in 45nm SOI CMOS," in *Custom Integrated Circuits Conference (CICC)*, 2012 IEEE, Sept 2012, pp. 1–4.
- [36] H. Krishnaswamy and H. Hashemi, "Inductor- and Transformer-based Integrated RF Oscillators: A Comparative Study," in *Custom Integrated Circuits Conference*, 2006. CICC '06. IEEE, Sept 2006, pp. 381–384.
- [37] J. Hasani and M. Kamarei, "Analysis and Optimum Design of a Class E RF Power Amplifier," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 55, no. 6, pp. 1759–1768, July 2008.
- [38] C. Avratoglou, N. Voulgaris, and F. Ioannidou, "Analysis and design of a generalized class E tuned power amplifier," *Circuits and Systems, IEEE Transactions on*, vol. 36, no. 8, pp. 1068–1079, Aug 1989.
- [39] M. Acar, A. Annema, and B. Nauta, "Generalized Analytical Design Equations for Variable Slope Class-E Power Amplifiers," in *Electronics, Circuits and Systems, 2006. ICECS '06.* 13th IEEE International Conference on, Dec 2006, pp. 431–434.
- [40] S. Kee, "The Class E/F Family of Harmonic-Tuned Switching Power Amplifiers," Ph.D. dissertation, California Institute of Technology, Pasadena, California, December 2001.
  [Online]. Available: http://resolver.caltech.edu/CaltechETD:etd-04262005-152703
- [41] C. Wang, L. Larson, and P. Asbeck, "Improved design technique of a microwave class-E power amplifier with finite switching-on resistance," in *Radio and Wireless Conference*, 2002. *RAWCON 2002. IEEE*, 2002, pp. 241–244.
- [42] O. Lee, J. Han, K. H. An, D. H. Lee, K.-S. Lee, S. Hong, and C.-H. Lee, "A Charging Acceleration Technique for Highly Efficient Cascode Class-E CMOS Power Amplifiers," *Solid-State Circuits, IEEE Journal of*, vol. 45, no. 10, pp. 2184–2197, Oct 2010.

- [43] D. Sandstrom, B. Martineau, M. Varonen, M. Karkkainen, A. Cathelin, and K. A. I. Halonen, "94GHz Power-Combining Power Amplifier with +13dBm Saturated Output Power in 65nm CMOS," in *Radio Frequency Integrated Circuits Symposium (RFIC), 2011 IEEE*, June 2011, pp. 1–4.
- [44] S. Ko and J. Lin, "A Linearized Cascode CMOS Power Amplifier," in Wireless and Microwave Technology Conference, 2006. WAMICON '06. IEEE Annual, Dec 2006, pp. 1–4.
- [45] A. Siligaris, Y. Hamada, C. Mounet, C. Raynaud, B. Martineau, N. Deparis, N. Rolland, M. Fukaishi, and P. Vincent, "A 60GHz Power Amplifier with 14.5 dBm Saturation Power and 25% Peak PAE in CMOS 65 nm SOI," *Solid-State Circuits, IEEE Journal of*, vol. 45, no. 7, pp. 1286–1294, July 2010.
- [46] J. Sharma and H. Krishnaswamy, "216- and 316-GHz 45-nm SOI CMOS Signal Sources Based on a Maximum-Gain Ring Oscillator Topology," *Microwave Theory and Techniques, IEEE Transactions on*, vol. 61, no. 1, pp. 492–504, Jan 2013.
- [47] BSIMSOI Manual.
- [48] IE3D User Manual.
- [49] U. Gogineni, J. del Alamo, and C. Putnam, "RF power potential of 45 nm CMOS technology," in Silicon Monolithic Integrated Circuits in RF Systems (SiRF), 2010 Topical Meeting on, Jan 2010, pp. 204–207.
- [50] J. Chen, R. Bhat, and H. Krishnaswamy, "A Compact Fully Integrated High-Efficiency 5GHz Stacked Class-E PA in 65nm CMOS Based on Transformer-Based Charging Acceleration," in *Compound Semiconductor Integrated Circuit Symposium (CSICS)*, 2012 IEEE, Oct 2012, pp. 1–4.
- [51] J. Hasani and M. Kamarei, "Analysis and Optimum Design of a Class E RF Power Amplifier," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 55, no. 6, pp. 1759–1768, July 2008.

- [52] Y. Atesal, B. Cetinoneri, M. Chang, R. Alhalabi, and G. Rebeiz, "Millimeter-Wave Wafer-Scale Silicon BiCMOS Power Amplifiers Using Free-Space Power Combining," *Microwave Theory and Techniques, IEEE Transactions on*, vol. 59, no. 4, pp. 954–965, April 2011.
- [53] J.-G. Kim and G. Rebeiz, "Miniature four-way and two-way 24 GHz wilkinson power dividers in 0.13μm CMOS," *IEEE Microwave and Wireless Components Letters*, vol. 17, no. 9, pp. 658–660, Sept 2007.
- [54] O. Ogunnika and A. Valdes-Garcia, "A 60GHz Class-E Tuned Power Amplifier with PAE >25% in 32nm SOI CMOS," in *Radio Frequency Integrated Circuits Symposium (RFIC)*, 2012 IEEE, June 2012, pp. 65–68.
- [55] D. Zhao, S. Kulkarni, and P. Reynaert, "A 60GHz Dual-Mode Power Amplifier with 17.4dBm Output Power and 29.3% PAE in 40-nm CMOS," in *ESSCIRC (ESSCIRC), 2012 Proceedings* of the, Sept 2012, pp. 337–340.
- [56] B. Hanafi, O. Gurbuz, H. Dabag, S. Pornpromlikit, G. Rebeiz, and P. Asbeck, "A CMOS 45 GHz power amplifier with output power >600 mW using spatial power combining," in *Microwave Symposium (IMS)*, 2014 IEEE MTT-S International, June 2014, pp. 1–3.
- [57] D. Zhao, S. Kulkarni, and P. Reynaert, "A 60GHz outphasing transmitter in 40nm CMOS with 15.6dBm output power," in *Solid-State Circuits Conference Digest of Technical Papers* (ISSCC), 2012 IEEE International, Feb 2012, pp. 170–172.
- [58] K.-Y. Wang, T.-Y. Chang, and C.-K. Wang, "A 1V 19.3dBm 79GHz power amplifier in 65nm CMOS," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International, Feb 2012, pp. 260–262.
- [59] J. Chen and A. Niknejad, "A compact 1V 18.6dBm 60GHz power amplifier in 65nm CMOS," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2011 IEEE International, Feb 2011, pp. 432–433.
- [60] C. Law and A.-V. Pham, "A High-Gain 60GHz Power Amplifier With 20dBm Output Power In 90nm CMOS," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International, Feb 2010, pp. 426–427.

- [61] A. Valdes-Garcia, S. Reynolds, and U. Pfeiffer, "A 60GHz Class-E Power Amplifier in SiGe," in Solid-State Circuits Conference, 2006. ASSCC 2006. IEEE Asian, Nov 2006, pp. 199–202.
- [62] N. Kalantari and J. Buckwalter, "A 19.4 dBm, Q-Band Class-E Power Amplifier in a 0.12μm SiGe BiCMOS Process," *Microwave and Wireless Components Letters, IEEE*, vol. 20, no. 5, pp. 283–285, May 2010.
- [63] H.-T. Dabag, J. Kim, L. Larson, J. Buckwalter, and P. Asbeck, "A 45-GHz SiGe HBT Amplifier with Greater Than 25% Efficiency and 30 mW Saturated Output Power," in *Bipo-lar/BiCMOS Circuits and Technology Meeting (BCTM)*, 2011 IEEE, Oct 2011, pp. 25–28.
- [64] K. Datta, J. Roderick, and H. Hashemi, "Analysis, Design and Implementation of mm-Wave SiGe Stacked Class-E Power Amplifiers," in *Radio Frequency Integrated Circuits Symposium* (*RFIC*), 2013 IEEE, June 2013, pp. 275–278.
- [65] —, "A 22.4 dBm Two-Way Wilkinson Power-Combined Q-Band SiGe Class-E Power Amplifier With 23% Peak PAE," in *Compound Semiconductor Integrated Circuit Symposium (CSICS)*, 2012 IEEE, Oct 2012, pp. 1–4.
- [66] C. Campbell and S. Brown, "A compact, 40 GHz 0.5 W power amplifier MMIC," in GaAs IC Symposium, 1999. 21st Annual, Oct 1999, pp. 141–147.
- [67] F. Colomb and A. Platzker, "A 3-watt Q-band GaAs pHEMT power amplifier MMIC for high temperature operation," in 2006 IEEE MTT-S International Microwave Symposium (IMS) Digest, June 2006, pp. 897–900.
- [68] A.-K. Chen, Y. Baeyens, Y.-K. Chen, and J. Lin, "An 83-GHz High-Gain SiGe BiCMOS Power Amplifier Using Transmission-Line Current-Combining Technique," *Microwave Theory* and Techniques, IEEE Transactions on, vol. 61, no. 4, pp. 1557–1569, April 2013.
- [69] R. Bhat, A. Chakrabarti, and H. Krishnaswamy, "Large-scale power-combining and linearization in watt-class mmWave CMOS power amplifiers," in 2013 IEEE Radio Frequency Integrated Circuits (RFIC) Symposium, June 2013, pp. 283–286.

- [70] H. Xu, Y. Palaskas, A. Ravi, M. Sajadieh, M. El-Tanani, and K. Soumyanath, "A Flip-Chip-Packaged 25.3 dBm Class-D Outphasing Power Amplifier in 32 nm CMOS for WLAN Application," *Solid-State Circuits, IEEE Journal of*, vol. 46, no. 7, pp. 1596–1605, July 2011.
- [71] P. Madoglio, A. Ravi, H. Xu, K. Chandrashekar, M. Verhelst, S. Pellerano, L. Cuellar, M. Aguirre, M. Sajadieh, O. Degani, H. Lakdawala, and Y. Palaskas, "A 20dBm 2.4GHz digital outphasing transmitter for WLAN application in 32nm CMOS," in *Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International*, Feb 2012, pp. 168–170.
- [72] L. Piazzon, P. Colantonio, F. Giannini, and R. Giofre, "New generation of multi-step Doherty amplifier," in 2011 European Microwave Integrated Circuits (EuMIC) Conference, Oct 2011, pp. 116–119.
- [73] A. Agah, B. Hanafi, H. Dabag, P. Asbeck, L. Larson, and J. Buckwalter, "A 45GHz Doherty power amplifier with 23% PAE and 18dBm output power, in 45nm SOI CMOS," in 2012 IEEE MTT-S International Microwave Symposium Digest (MTT), June 2012, pp. 1–3.
- [74] R. Staszewski and P. Balsara, All-Digital Frequency Synthesizer in Deep-Submicron CMOS.
  Wiley, 2006. [Online]. Available: http://books.google.nl/books?id=2VHFD-7LgAwC
- [75] M. Alavi, R. Staszewski, L. de Vreede, and J. Long, "A Wideband 2×13-bit All-Digital I/Q RF-DAC," *Microwave Theory and Techniques, IEEE Transactions on*, vol. 62, no. 4, pp. 732–752, April 2014.
- [76] C. Lu, H. Wang, C. Peng, A. Goel, S. Son, P. Liang, A. Niknejad, H. Hwang, and G. Chien, "A 24.7dBm all-digital RF transmitter for multimode broadband applications in 40nm CMOS," in *Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International*, Feb 2013, pp. 332–333.
- [77] A. Kavousian, D. Su, M. Hekmat, A. Shirvani, and B. Wooley, "A Digitally Modulated Polar CMOS Power Amplifier With a 20-MHz Channel Bandwidth," *Solid-State Circuits, IEEE Journal of*, vol. 43, no. 10, pp. 2251–2258, Oct 2008.

- [78] D. Chowdhury, L. Ye, E. Alon, and A. Niknejad, "An Efficient Mixed-Signal 2.4-GHz Polar Power Amplifier in 65-nm CMOS Technology," *Solid-State Circuits, IEEE Journal of*, vol. 46, no. 8, pp. 1796–1809, Aug 2011.
- [79] S. Kousai and A. Hajimiri, "An octave-range, watt-level, fully-integrated cmos switching power mixer array for linearization and back-off-efficiency improvement," *Solid-State Circuits*, *IEEE Journal of*, vol. 44, no. 12, pp. 3376–3392, Dec 2009.
- [80] A. Agah, W. Wang, P. Asbeck, L. Larson, and J. Buckwalter, "A 42 to 47-GHz, 8-bit I/Q digital-to-RF converter with 21-dBm Psat and 16% PAE in 45-nm SOI CMOS," in 2013 IEEE Radio Frequency Integrated Circuits (RFIC) Symposium, June 2013, pp. 249–252.
- [81] K. Khalaf, V. Vidojkovic, K. Vaesen, J. Long, W. Van Thillo, and P. Wambacq, "A digitally modulated 60GHz polar transmitter in 40nm CMOS," in *Radio Frequency Integrated Circuits Symposium*, 2014 IEEE, June 2014, pp. 159–162.
- [82] E. Laskin, M. Khanpour, S. Nicolson, A. Tomkins, P. Garcia, A. Cathelin, D. Belot, and S. Voinigescu, "Nanoscale CMOS Transceiver Design in the 90–170-GHz Range," *Microwave Theory and Techniques, IEEE Transactions on*, vol. 57, no. 12, pp. 3477–3490, Dec 2009.
- [83] R. Bhat and H. Krishnaswamy, "A watt-level 2.4 GHz RF I/Q power DAC transmitter with integrated mixed-domain FIR filtering of quantization noise in 65 nm CMOS," in *Radio Frequency Integrated Circuits Symposium*, 2014 IEEE, June 2014, pp. 413–416.
- [84] C. Wang, "CMOS Power Amplifiers for Wireless Communications," Ph.D. dissertation, University of California, San Diego, La Jolla, San Diego, California, 2003. [Online]. Available: http://ezproxy.cul.columbia.edu/login?url=http://search.proquest.com/docview/ 305339373?accountid=10226
- [85] D. Zhao and P. Reynaert, "14.1 A 0.9V 20.9dBm 22.3%-PAE E-band power amplifier with broadband parallel-series power combiner in 40nm CMOS," in 2014 IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, Feb 2014, pp. 248–249.

- [86] F. Shirinfar, M. Nariman, T. Sowlati, M. Rofougaran, R. Rofougaran, and S. Pamarti, "A fully integrated 22.6dBm mm-Wave PA in 40nm CMOS," in 2013 IEEE Radio Frequency Integrated Circuits (RFIC) Symposium, June 2013, pp. 279–282.
- [87] K. Datta and H. Hashemi, "A mm-wave class-E 1-bit power modulator," in Custom Integrated Circuits Conference (CICC), 2014 IEEE Proceedings of the, Sept 2014, pp. 1–4.
- [88] —, "A 29dBm 18.5load modulation," in Solid- State Circuits Conference (ISSCC), 2015 IEEE International, Feb 2015, pp. 1–3.
- [89] J. Walling, H. Lakdawala, Y. Palaskas, A. Ravi, O. Degani, K. Soumyanath, and D. Allstot,
  "A 28.6dBm 65nm Class-E PA with Envelope Restoration by Pulse-Width and Pulse-Position Modulation," in Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International, Feb 2008, pp. 566–636.
- [90] J. Kitchen, I. Deligoz, S. Kiaei, and B. Bakkaloglu, "Polar SiGe Class E and F Amplifiers Using Switch-Mode Supply Modulation," *Microwave Theory and Techniques, IEEE Transactions on*, vol. 55, no. 5, pp. 845–856, May 2007.
- [91] L. Chang, R. Montoye, B. Ji, A. Weger, K. Stawiasz, and R. Dennard, "A fully-integrated switched-capacitor 2:1 voltage converter with regulation capability and 90% efficiency at 2.3A/mm<sup>2</sup>," in VLSI Circuits (VLSIC), 2010 IEEE Symposium on, June 2010, pp. 55–56.
- [92] Y. Ramadass, A. Fayed, and A. Chandrakasan, "A Fully-Integrated Switched-Capacitor Step-Down DC-DC Converter With Digital Capacitance Modulation in 45 nm CMOS," *Solid-State Circuits, IEEE Journal of*, vol. 45, no. 12, pp. 2557–2565, Dec 2010.
- [93] H.-P. Le, S. Sanders, and E. Alon, "Design Techniques for Fully Integrated Switched-Capacitor DC-DC Converters," *Solid-State Circuits, IEEE Journal of*, vol. 46, no. 9, pp. 2120–2131, Sept 2011.
- [94] G. Patounakis, Y. Li, and K. L. Shepard, "A fully integrated on-chip DC-DC conversion and power management system," *Solid-State Circuits, IEEE Journal of*, vol. 39, no. 3, pp. 443–451, March 2004.

- [95] J. Walling, S. Taylor, and D. Allstot, "A Class-G Supply Modulator and Class-E PA in 130 nm CMOS," Solid-State Circuits, IEEE Journal of, vol. 44, no. 9, pp. 2339–2347, Sept 2009.
- [96] S.-M. Yoo, J. Walling, O. Degani, B. Jann, R. Sadhwani, J. Rudell, and D. Allstot, "A Class-G Switched-Capacitor RF Power Amplifier," *Solid-State Circuits, IEEE Journal of*, vol. 48, no. 5, pp. 1212–1224, May 2013.
- [97] S. Sanders, E. Alon, H.-P. Le, M. Seeman, M. John, and V. Ng, "The Road to Fully Integrated DC-DC Conversion via the Switched-Capacitor Approach," *Power Electronics, IEEE Transactions on*, vol. 28, no. 9, pp. 4146–4155, Sept 2013.
- [98] V. Ng and S. Sanders, "A High-Efficiency Wide-Input-Voltage Range Switched Capacitor Point-of-Load DC-DC Converter," *Power Electronics, IEEE Transactions on*, vol. 28, no. 9, pp. 4335–4341, Sept 2013.
- [99] W. Gaber, P. Wambacq, J. Craninckx, and M. Ingels, "A CMOS IQ Digital Doherty Transmitter using modulated tuning capacitors," in ESSCIRC (ESSCIRC), 2012 Proceedings of the, Sept 2012, pp. 341–344.
- [100] S. Hu, S. Kousai, J. Park, O. Chlieh, and H. Wang, "Design of A Transformer-Based Reconfigurable Digital Polar Doherty Power Amplifier Fully Integrated in Bulk CMOS," *Solid-State Circuits, IEEE Journal of*, vol. 50, no. 5, pp. 1094–1106, May 2015.
- [101] J. Chen, L. Ye, D. Titz, F. Gianesello, R. Pilard, A. Cathelin, F. Ferrero, C. Luxey, and A. Niknejad, "A digitally modulated mm-Wave cartesian beamforming transmitter with quadrature spatial combining," in 2013 IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, Feb 2013, pp. 232–233.
- [102] S. M. Alavi, "All-Digital I/Q RF-DAC," Ph.D. dissertation, Delft University of Technology, 2014. [Online]. Available: http://repository.tudelft.nl/assets/uuid: fd7dec40-1957-4aad-bb24-90d6f40b5268/Thesis.pdf
- [103] Y. Moghe, T. Lehmann, and T. Piessens, "Nanosecond Delay Floating High Voltage Level Shifters in a 0.35µm HV-CMOS Technology," *Solid-State Circuits, IEEE Journal of*, vol. 46, no. 2, pp. 485–497, Feb 2011.

- [104] B. Serneels, M. Steyaert, and W. Dehaene, "A High speed, Low Voltage to High Voltage Level Shifter in Standard 1.2V 0.13μm CMOS," in *Electronics, Circuits and Systems, 2006. ICECS '06. 13th IEEE International Conference on*, Dec 2006, pp. 668–671.
- [105] K.-J. Koh and G. Rebeiz, "0.13-μm CMOS Phase Shifters for X-, Ku-, and K-Band Phased Arrays," Solid-State Circuits, IEEE Journal of, vol. 42, no. 11, pp. 2535–2546, Nov 2007.
- [106] K.-J. Kim and K. Ahn, "Design of 60 GHz Vector Modulator Based Active Phase Shifter," in Electronic Design, Test and Application (DELTA), 2011 Sixth IEEE International Symposium on, Jan 2011, pp. 140–143.
- [107] J. Sharma, T. Dinc, and H. Krishnaswamy, "A 200GHz power mixer in 130nm-CMOS employing nonlinearity engineering," in *Radio Frequency Integrated Circuits Symposium*, 2014 *IEEE*, June 2014, pp. 347–350.
- [108] G. Gangasani, C.-M. Hsu, J. Bulzacchelli, T. Beukema, W. Kelly, H. Xu, D. Freitas, A. Prati, D. Gardellini, R. Reutemann, G. Cervelli, J. Hertle, M. Baecher, J. Garlett, P.-A. Francese, J. Ewen, D. Hanson, D. Storaska, and M. Meghelli, "A 32 Gb/s Backplane Transceiver With On-Chip AC-Coupling and Low Latency CDR in 32 nm SOI CMOS Technology," *Solid-State Circuits, IEEE Journal of*, vol. 49, no. 11, pp. 2474–2489, Nov 2014.
- [109] J.-K. Kim, J. Kim, G. Kim, and D.-K. Jeong, "A Fully Integrated 0.13μ m CMOS 40 Gbps Serial Link Transceiver," Solid-State Circuits, IEEE Journal of, vol. 44, no. 5, pp. 1510–1521, May 2009.
- [110] H. Kimura, P. Aziz, T. Jing, A. Sinha, S. Kotagiri, R. Narayan, H. Gao, P. Jing, G. Hom, A. Liang, E. Zhang, A. Kadkol, R. Kothari, G. Chan, Y. Sun, B. Ge, J. Zeng, K. Ling, M. Wang, A. Malipatil, L. Li, C. Abel, and F. Zhong, "A 28 Gb/s 560 mW Multi-Standard SerDes With Single-Stage Analog Front-End and 14-Tap Decision Feedback Equalizer in 28 nm CMOS," *Solid-State Circuits, IEEE Journal of*, vol. 49, no. 12, pp. 3091–3103, Dec 2014.
- [111] L. Szilagyi, G. Belfiore, R. Henker, and F. Ellinger, "Low power inductor-less CML latch and frequency divider for full-rate 20 Gbps in 28-nm CMOS," in *Microelectronics and Electronics* (*PRIME*), 2014 10th Conference on Ph.D. Research in, June 2014, pp. 1–4.

- [112] T. Sekiguchi, S. Amakawa, N. Ishihara, and K. Masu, "An 8.9mW 25Gb/s inductorless 1:4 DEMUX in 90nm CMOS," in SoC Design Conference (ISOCC), 2009 International, Nov 2009, pp. 404–407.
- [113] P. Heydari and R. Mohanavelu, "A 40-GHz Flip-Flop-Based Frequency Divider," Circuits and Systems II: Express Briefs, IEEE Transactions on, vol. 53, no. 12, pp. 1358–1362, Dec 2006.
- [114] K. Fukuda, H. Yamashita, G. Ono, R. Nemoto, E. Suzuki, N. Masuda, T. Takemoto, F. Yuki, and T. Saito, "A 12.3-mW 12.5-Gb/s Complete Transceiver in 65-nm CMOS Process," *Solid-State Circuits, IEEE Journal of*, vol. 45, no. 12, pp. 2838–2849, Dec 2010.
- [115] J. Poulton, R. Palmer, A. Fuller, T. Greer, J. Eyles, W. Dally, and M. Horowitz, "A 14mW 6.25-Gb/s Transceiver in 90-nm CMOS," *Solid-State Circuits, IEEE Journal of*, vol. 42, no. 12, pp. 2745–2757, Dec 2007.
- [116] T. Beukema, M. Sorna, K. Selander, S. Zier, B. Ji, P. Murfet, J. Mason, W. Rhee, H. Ainspan, B. Parker, and M. Beakes, "A 6.4-Gb/s CMOS SerDes core with feed-forward and decisionfeedback equalization," *Solid-State Circuits, IEEE Journal of*, vol. 40, no. 12, pp. 2633–2645, Dec 2005.
- [117] "Eighth broadband progress report, FCC 12-90," Federal Communications Commission, Tech. Rep., 08 2012.
- [118] J. E. Prieger, "The broadband digital divide and the economic benefits of mobile broadband for rural areas," *Telecommunications Policy*, vol. 37, no. 67, pp. 483 – 502, 2013. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0308596112001917
- [119] I. Maric, B. Bostjancic, and A. Goldsmith, "Resource allocation for constrained backhaul in picocell networks," pp. 1–6, Feb 2011.
- [120] W.-Y. Lin, Y.-C. Chen, R.-Y. Chang, S.-H. Chen, and C.-L. Lee, "Rapid WiMAX network deployment for emergency services," pp. 1–5, July 2012.
- [121] G. T. 36.814, "Further advancements for E-UTRA physical layer aspects (Release 9)," Available at http://www.3gpp.org/ftp/Specs/archive/36series/36.8I4!368I4-900.zip, Mar. 2010.

- [122] E. Torkildson, B. Ananthasubramaniam, U. Madhow, and M. Rodwell, "Millimeter-wave MIMO: Wireless Links at Optical Speeds," in Proc. of 44th Allerton Conference on Communication, Control and Computing, Monticello, Illinois, Sep. 2006.
- [123] C. Sheldon, E. Torkildson, M. Seo, C. Yue, U. Madhow, and M. Rodwell, "A 60GHz line-ofsight 2×2 MIMO link operating at 1.2Gbps," in Antennas and Propagation Society International Symposium, 2008. AP-S 2008. IEEE, July 2008, pp. 1–4.
- [124] C. Sheldon, M. Seo, E. Torkildson, M. Rodwell, and U. Madhow, "Four-channel spatial multiplexing over a millimeter-wave line-of-sight link," in *Microwave Symposium Digest, 2009. MTT '09. IEEE MTT-S International*, June 2009, pp. 389–392.
- [125] A. Mezghani and J. Nossek, "On Ultra-Wideband MIMO Systems with 1-bit Quantized Outputs: Performance Analysis and Input Optimization," in *Information Theory*, 2007. ISIT 2007. IEEE International Symposium on, June 2007, pp. 1286–1289.
- [126] Y. Yilmaz and X. Wang, "Sequential Decentralized Parameter Estimation Under Randomly Observed Fisher Information," *Information Theory*, *IEEE Transactions on*, vol. 60, no. 2, pp. 1281–1300, Feb 2014.
- [127] A. Agah, J. Jayamon, P. Asbeck, L. Larson, and J. Buckwalter, "Multi-Drive Stacked-FET Power Amplifiers at 90 GHz in 45 nm SOI CMOS," *Solid-State Circuits, IEEE Journal of*, vol. 49, no. 5, pp. 1148–1157, May 2014.
- [128] J. Jayamon, A. Agah, B. Hanafi, H. Dabag, J. Buckwalter, and P. Asbeck, "A W-band stacked FET power amplifier with 17 dBm Psat in 45-nm SOI CMOS," in *Silicon Monolithic Integrated Circuits in RF Systems (SiRF), 2013 IEEE 13th Topical Meeting on*, Jan 2013, pp. 156–158.

Part II

Appendices

## Appendix A

# Modeling of Passive Components in IBM 45 nm SOI CMOS



Figure A.1: (a)Series capacitance  $(C = -\frac{1}{\omega \times imag(\frac{1}{Y_{21}})})$  and (b)  $Q (= \frac{imag(Y_{21})}{real(Y_{21})})$  of a L=12 $\mu$ m, W=11.42 $\mu$ m 280fF VNCAP from the PDK model, EM-simulation-based model and measurements.

The inductances and transmission lines used in the prototypes have been implemented using CPWs in the topmost metal layer with a continuous ground plane underneath [3]. A 66 $\Omega$  CPW used in the PA designs has a measured quality factor of  $\approx 18$  in the Q-band (33GHz-50GHz) [46]. The capacitors used in the PA designs have been implemented using interdigitated capacitors called

Vertical Natural Capacitors (VNCAPs). A W=11.42 $\mu$ m×L=12 $\mu$ m 280fF VNCAP has a measured quality factor which ranges from 11 to 6 across the Q-band (Fig. A.1).