Efficient and Linear CMOS Power Amplifier and Front-end Design for Broadband Fully-Integrated 28-GHz 5G Phased Arrays by Mahmoud Shakib Roshdy, Sherif Abdelhalim
EFFICIENT AND LINEAR CMOS POWER AMPLIFIER AND
FRONT-END DESIGN FOR BROADBAND FULLY-INTEGRATED
28-GHZ 5G PHASED ARRAYS
A Dissertation
by
SHERIF ABDELHALIM MAHMOUD SHAKIB ROSHDY
Submitted to the Office of Graduate and Professional Studies of
Texas A&M University
in partial fulfillment of the requirements for the degree of
DOCTOR OF PHILOSOPHY
Chair of Committee, Kamran Entesari
Co-chair of Committee, Samuel Palermo
Committee Members, Robert Nevels
Mahmoud El-Halwagi
Head of Department, Miroslav Begovic
August 2017
Major Subject: Electrical Engineering
Copyright 2017 Sherif Abdelhalim Mahmoud Shakib Roshdy
ABSTRACT
Demand for data traffic on mobile networks is growing exponentially with time and
on a global scale. The emerging fifth-generation (5G) wireless standard is being devel-
oped with millimeter-wave (mm-Wave) links as a key technological enabler to address
this growth by a 2020 time frame. The wireless industry is currently racing to deploy mm-
Wave mobile services, especially in the 28-GHz band. Previous widely-held perceptions of
fundamental propagation limitations were overcome using phased arrays. Equally impor-
tant for success of 5G is the development of low-power, broadband user equipment (UE)
radios in commercial-grade technologies. This dissertation demonstrates design method-
ologies and circuit techniques to tackle the critical challenge of key phased array front-end
circuits in low-cost complementary metal oxide semiconductor (CMOS) technology. Two
power amplifier (PA) proof-of-concept prototypes are implemented in deeply scaled 28-
nm and 40-nm CMOS processes, demonstrating state-of-the-art linearity and efficiency for
extremely broadband communication signals. Subsequently, the 40 nm PA design is suc-
cessfully embedded into a low-power fully-integrated transmit-receive front-end module.
The 28 nm PA prototype in this dissertation is the first reported linear, bulk CMOS
PA targeting low-power 5G mobile UE integrated phased array transceivers. An optimiza-
tion methodology is presented to maximizing power added efficiency (PAE) in the PA
output stage at a desired error vector magnitude (EVM) and range to address challenging
5G uplink requirements. Then, a source degeneration inductor in the optimized output
stage is shown to further enable its embedding into a two-stage transformer-coupled PA.
The inductor helps by broadening inter-stage impedance matching bandwidth, and help-
ing to reduce distortion. Designed and fabricated in 1P7M 28 nm bulk CMOS and using
a 1 V supply, the PA achieves +4.2 dBm/9% measured Pout/PAE at  25 dBc EVM for
ii
a 250 MHz-wide, 64-QAM orthogonal frequency division multiplexing (OFDM) signal
with 9.6 dB peak-to-average power ratio (PAPR). The PA also achieves 35.5%/10% PAE
for continuous wave signals at saturation/9.6dB back-off from saturation. To the best of
the author’s knowledge, these are the highest measured PAE values among published K-
andKa-band CMOS PAs to date.
To drastically extend the communication bandwidth in 28 GHz-band UE devices, and
to explore the potential of CMOS technology for more demanding access point (AP) de-
vices, the second PA is demonstrated in a 40 nm process. This design supports a signal
radio frequency bandwidth (RFBW) >3 the state-of-the-art without degrading output
power (i.e. range), PAE (i.e. battery life), or EVM (i.e. amplifier fidelity). The three-stage
PA uses higher-order, dual-resonance transformer matching networks with bandwidths op-
timized for wideband linearity. Digital gain control of 9 dB range is integrated for phased
array operation. The gain control is a needed functionality, but it is largely absent from re-
ported high-performance mm-Wave PAs in the literature. The PA is fabricated in a 1P6M
40 nm CMOS LP technology with 1.1 V supply, and achieves Pout/PAE of+6.7 dBm/11%
for an 8100 MHz carrier aggregation 64-QAM OFDM signal with 9.7 dB PAPR. This
PA therefore is the first to demonstrate the viability of CMOS technology to address even
the very challenging 5G AP/downlink signal bandwidth requirement.
Finally, leveraging the developed PA design methodologies and circuits, a low power
transmit-receive phased array front-end module is fully integrated in 40 nm technology.
In transmit-mode, the front-end maintains the excellent performance of the 40 nm PA:
achieving +5.5 dBm/9% for the same 8100 MHz carrier aggregation signal above. In
receive-mode, a 5.5 dB noise figure (NF ) and a minimum third-order input intercept point
(IIP3) of  13 dBm are achieved. The performance of the implemented CMOS front-
end is comparable to state-of-the-art publications and commercial products that were very
recently developed in silicon germanium (SiGe) technologies for 5G communication.
iii
DEDICATION
To my parents.
iv
ACKNOWLEDGMENTS
All thanks are foremost due to Allah, the Most Gracious, the Most Merciful, for en-
abling me to complete my PhD studies.
I express my deepest gratitude to my advisor, Professor Kamran Entesari, for his sup-
port throughout my studies, and for his continuous guidance and tireless effort in helping
me to reach this achievement.
I also thank Professor Samuel Palermo, for his contributions to my development and
learning experiences at Texas A&M both as an instructor and through our fruitful research
collaboration. Special thanks are also due to my dissertation committee members, Pro-
fessor Robert Nevels and Professor Mahmoud El-Halwagi, for their time, feedback, and
valuable suggestions.
My life as a PhD student has certainly been enriched by my fellow graduate students
that I was so fortunate to have met and learned so much from. I express my greatest thanks
to Mohamed El-Kholy, Osama El-Hadidy, Hatem Osman, Ayman Ameen, Omar El-Sayed,
Ahmed Helmy, Ramy Saad, and Amr Abuellil for all that I learned from them and for the
wonderful time I had in their company.
The research in this dissertation is a result of the highly valuable support I received
from Qualcomm Technologies, Inc., which was made possible by Vladimir Aparin and
Jeremy Dunworth, to whom I am indebted. I also thank Hyun-Chul Park and Bon-Hyun
Ku for our many technical discussions and for their helpful suggestions. I especially thank
my friend and colleague Mohamed Elkholy for his help on our 40-nm millimeter wave test
chip.
Finally, I thank my parents Wafaa and Abdelhalim for their great dedication and sac-
rifice throughout my life. Nothing I ever say or do can repay their love and support, and
v
their tireless effort and prayers for me to reach my full potential. I also thank my brother
Husam for his advice, prayers, and unconditional love.
vi
CONTRIBUTORS AND FUNDING SOURCES
Contributors
This work was supported by a dissertation committee consisting of Professor Kam-
ran Entesari, and Professors Samuel Palermo and Robert Nevels of the Department of
Electrical and Computer Engineering, as well as Professor Mahmoud El-Halwagi of the
Department of Chemical Engineering.
The semi-empirical 5G link budget analysis in Chapter 2 (Section 2.2.1) was originally
performed by Dr. Vladimir Aparin of Qualcomm Technologies, Inc. and was revised and
re-organized by the student to cite appropriate references and collect the key theoretical
equations in a compact and academic format. Dr. Hyun-Chul Park helped with the ground
shield and top-level finishing of the physical layout of the 28-nm CMOS power amplifier
prototype in Chapter 2. With the help of the student on theoretical and practical mm-wave
circuit design practices, as well as with extensive computer-aided design ennoblement by
the student, Dr. Mohamed Elkholy designed the low-noise amplifier and the phase shifter
circuit blocks that are described in Sections 4.3.2 and 4.3.3. He also provided suggestions
and ran initial circuit simulations for the integration of the front-end module in Section 4.4.
Mr. Ozvaldo Alcala, Mr. Andrew Yang, and Mr. David Palmer of Qualcomm Technolo-
gies, Inc. provided test equipment support for automating the millimter wave single-tone,
two tone, and modulated signal error vector magnitude measurements presented in Chap-
ters 3 and 4, and Mr. Martin Lim of Rhode & Schwarz, Inc. helped with code development
for this automated testing.
All other work conducted for the dissertation was completed by the student indepen-
dently.
vii
Funding Sources
This work was made possible by Qualcomm Technologies, Inc. in part through a
professional internship that was provided to the student, and in part through its subsequent
and resulting agreement to collaborate technically and to fund Professor Kamran Entesari’s
research group under the Texas Engineering Experiment Station contract for the project
entitled "CMOS mm-wave transceivers."
Its contents are solely the responsibility of the authors and do not necessarily represent
the official views of Qualcomm Technologies, Inc.
viii
TABLE OF CONTENTS
Page
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
CONTRIBUTORS AND FUNDING SOURCES . . . . . . . . . . . . . . . . . . vii
TABLE OF CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
1. INTRODUCTION AND LITERATURE REVIEW . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Power Amplifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Classes of Operation . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Load Line Matching . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.3 PA Linearity Metrics for Modulated Signals . . . . . . . . . . . . 4
1.2.4 Linearity Benefits of Differential Operation . . . . . . . . . . . . 5
1.2.5 Linear RF CMOS PA Literature . . . . . . . . . . . . . . . . . . 8
1.2.6 mm-Wave CMOS PA Design Challenges . . . . . . . . . . . . . 9
1.3 Transmit-Receive Switch . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 MOS Switch Parasitics . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 Review of CMOS TR Switch Literature . . . . . . . . . . . . . . 11
1.4 Passive Phase Shifter Literature . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Dissertation Scope and Organization . . . . . . . . . . . . . . . . . . . . 14
2. A HIGHLY EFFICIENT AND LINEAR POWER AMPLIFIER FOR 28-GHZ
5G PHASED ARRAY RADIOS IN 28-NM CMOS . . . . . . . . . . . . . . . 16
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 System Considerations for 5G Phased Array Radios . . . . . . . . . . . . 18
2.2.1 Choice of Carrier Frequency for 5G Systems . . . . . . . . . . . 18
2.2.2 Output Power Requirements . . . . . . . . . . . . . . . . . . . . 23
ix
2.3 Output Stage Optimization Methodology . . . . . . . . . . . . . . . . . . 25
2.3.1 Parameterized Output Stage . . . . . . . . . . . . . . . . . . . . 25
2.3.2 Optimization Procedure . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.3 Optimization Results . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Inter-stage Impedance Matching . . . . . . . . . . . . . . . . . . . . . . 29
2.4.1 Physical Circuit Operation . . . . . . . . . . . . . . . . . . . . . 30
2.4.2 Driver Load Impedance . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.3 Effect of Ldeg on Power Capability . . . . . . . . . . . . . . . . . 33
2.4.4 Effect of Ldeg on Gain and Distortion . . . . . . . . . . . . . . . 35
2.5 Circuit Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5.1 Core Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.2 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.6.1 Measured Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.6.2 Comparison with State-of-the-art . . . . . . . . . . . . . . . . . . 44
2.6.3 Comparison with 5G Requirements . . . . . . . . . . . . . . . . 45
2.6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3. A WIDEBAND LINEAR 28-GHZ POWER AMPLIFIER FOR POWER-
EFFICIENT 5G PHASED ARRAYS IN 40-NM CMOS . . . . . . . . . . . . . 61
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4. A 28-GHZ TRANSMIT-RECEIVE FRONT-END MODULE FOR 5G HAND-
SET PHASED ARRAYS IN 40-NM CMOS . . . . . . . . . . . . . . . . . . . 75
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Transmit-receive Module Considerations . . . . . . . . . . . . . . . . . . 77
4.2.1 Link Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.2 Beam Steering . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3 Circuit Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3.1 Power Amplifier . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3.2 Low-noise Amplifier . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3.3 Phase Shifter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.4 Module Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.4.1 Antenna Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4.2 Phase Shifter Interface . . . . . . . . . . . . . . . . . . . . . . . 96
4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.5.1 Transmit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.5.2 Receive Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
x
4.5.3 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . 107
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
xi
LIST OF FIGURES
FIGURE Page
1.1 Illustration of fundamental single-transistor power amplifier operation. . . 3
1.2 Illustration of error vector. . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Illustration of adjacent channel leakage. . . . . . . . . . . . . . . . . . . 5
1.4 Illustration of continuous conduction of output signal by pseudo-differential
power amplifier action (also known as "push-pull" action). . . . . . . . . 6
1.5 Graphical/visual definition of sub-threshold (i.e. weak inversion) conduc-
tion as the gradual transition from cut-off to saturation. . . . . . . . . . . 7
2.1 Illustration of link budget analysis use scenario in comparison of potential
carrier frequencies for 5G systems. Reprinted from [1]. . . . . . . . . . . 19
2.2 Two-dimensional URA of UE or AP antennas for allowable physical size
dx dy of 26mm 15mm for UE array, 47mm 47mm for AP array: (a)
UE or AP at arbitrary fc , and (b) number of antenna elements in UE (left
axis) and AP (right axis) arrays; inset shows example for UE at fc=30GHz.
Reprinted from [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 (a) Scatter of best published data and fitted trendlines for PAE (fc) and
PTx;dc (fc), (b) required average Pout per element vs. fc for 64-QAM at
different BWsig, and (c) required average Pout per element vs. LOS range
at 30GHz for QPSK and 64-QAM at different BWsig. Reprinted from [1]. 22
2.4 Parameterized output stage circuit for optimization: (a) power cell layout,
(b) parameterized output stage circuit. Reprinted from [1]. . . . . . . . . 25
2.5 The two steps in the optimization procedure: (a) Step (1) - load-pull, (b)
Step (2) - EVM simulation. Reprinted from [1]. . . . . . . . . . . . . . . 27
2.6 Output stage transistor design chart for VDD = 1V: PAE (shaded) and av-
erage 64-QAMOFDM Pout (line) contours plotted at an EVM of 27dBc,
i.e. at a 2dB margin from EVMreq; design choice indicated by a circle, and
inset shows correspondence between VGS   Vt on x-axis and bias current
density JPA. Reprinted from [1]. . . . . . . . . . . . . . . . . . . . . . . 28
xii
2.7 Issues in using m = 12, VGS   Vt =  150mV optimization result: (a)
Cascaded amplifier frequency response overly sensitive to PVT and mod-
eling accuracy, (b) AM-PM conversion and DA stage load modulation due
to Cgs nonlinearity. Reprinted from [1]. . . . . . . . . . . . . . . . . . . 30
2.8 Single-ended model of differential-mode inter-stage matching: (a) circuit,
(b) 1st simplification (c) simplified model. Reprinted from [1]. . . . . . . 31
2.9 Smith chart trajectories for inter-stage matching to present (100 + j0)

differentially to DA: (a) Ldeg = 0, (b) Ldeg=28pH (i.e. 14pH single-
ended), and scatter of ZE due to independent Gaussian variations (3 
10%) in Cgs and CD;  13dB return loss region w.r.t (100 + j0)
 target
indicated with circle: (c) Ldeg = 0, and (d) Ldeg=28pH. Reprinted from [1]. 33
2.10 Micrographs of 12  32  1m=28nm transistor test structures used in
power capability verification: (a) Ldeg = 0, (b) Ldeg=14pH single-ended.
Reprinted from [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.11 Simulated AM-AM/AM-PM response of the output stage using re-designed
lossless input matching for each Ldeg 2 f0; 5; 15; 25; 50g]pH atWnMOS =
12  32  1m and JPA = 12A=m: (a) AM-AM response, and (b)
AM-PM response. Reprinted from [1]. . . . . . . . . . . . . . . . . . . . 37
2.12 Schematic of PA circuit: (a) two-stage transformer-coupled topology, (b)
push-pull stage with capactive neutralization capacitorCn and single-ended
source degeneration inductor Ldeg. Reprinted from [1]. . . . . . . . . . . 38
2.13 Measured MOM neutralization capacitor characteristics: (a) capacitance
and quality factor, (b) shunt-equivalent loss resistance calculated from ca-
pacitance and quality factor. Reprinted from [1]. . . . . . . . . . . . . . . 41
2.14 Output balun characterization: (a) test structure micrograph, (b) induc-
tances and quality factors (c) differential mode maximum available gain
(i.e. transformer efficiency). Reprinted from [1]. . . . . . . . . . . . . . 50
2.15 Die micrograph of fabricated two-stage PA. Reprinted from [1]. . . . . . . 51
2.16 Small- and large-signal CW signal measurement results for JPA = 12A
=m, JDA = 22A=m and 1V supply: (a) s-parameter results (b) best
measured CW signal input power sweep at fc =30GHz. Reprinted from [1]. 52
xiii
2.17 Swept large-CW-signal measurement results summary for JPA = 12A
=m, JDA = 22A=m: (a) key performance metrics over 27–31GHz for
1V supply, and (b) saturated performance metrics vs. supply voltage at
30GHz. Reprinted from [1]. . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.18 EVM measurement setup for 64-QAM OFDM signal. Reprinted from [1]. 54
2.19 Peak 64-QAM OFDM measured performance: output spectrum, ACPR,
and constellation for JPA = 12A=m, JDA = 22A=m and 1V supply
at Pout = +4:2dBm, BWsig = 250MHz (1.5Gbps), achieving 9% PAE at
EVM=  25dBc. Reprinted from [1]. . . . . . . . . . . . . . . . . . . . . 54
2.20 Swept 64-QAM OFDM signal measurement results summary for JPA =
12A=m, JDA = 22A=m: (a) average Pout and corresponding PAE vs.
fc for 1V supply, (b) average Pout and corresponding PAE supply voltage
at fc =30GHz. Reprinted from [1]. . . . . . . . . . . . . . . . . . . . . . 55
2.21 Measured AM-AM/AM-PM characteristics of two-stage PA at JDA =
22A=m and corresponding Savitzky-Golay smoothed characteristics for
three example JPA values: (a) JPA = 10:0A=m, (b) JPA = 12:9A
=m, and (c) JPA = 16:5A=m. Reprinted from [1]. . . . . . . . . . . . 56
2.22 Key metrics of measured AM-AM/AM-PM characteristics at 30GHz for
1V supply and JDA = 22A=m before and after Savitzky-Golay smooth-
ing: (a) P1dB, and (b) maximum AM-PM deviation w.r.t. small-signal for
Pout  P1dB. Reprinted from [1]. . . . . . . . . . . . . . . . . . . . . . . 57
2.23 EVM vs. average 64-QAM OFDM Pout at 30GHz obtained using direct
measurement (setup in Fig. 2.18) forBWsig = f150; 250gMHz, and using
behavioral simulation with measured AM-AM/AM-PM characteristics of
the two-stage PA (i.e. BWsig ! 0) for different JPA at JDA = 22A=m;
some example AM-AM/AM-PM characteristics are shown in Fig. 2.21.
Reprinted from [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.24 Simulated two-tone inter-modulation distortion at JDA = 22A=m, JPA =
12A=m forf = f20; 75; 125; 150250gMHz at the amplifier center fre-
quency: (a) lower IM3, (b) upper IM3. Reprinted from [1]. . . . . . . . . 59
2.25 Measured two-tone inter-modulation distortion at JDA = 21A=m, JPA =
23:8A=m for f = 20MHz across 27–31GHz center frequency: (a)
lower IM3, (b) upper IM3, (c) lower IM5, and (d) upper IM5. Reprinted
from [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
xiv
3.1 Three-stage PA: cascode VGA 1st stage (42dB digital gain steps), and
capacitively-neutralized common-source 2nd and 3rd stages; power tran-
sistor size scaling indicated in units. Reprinted from [2]. . . . . . . . . . 65
3.2 Optimization of linearity and PAE in stage 3 using spacingfin of the two
resonance frequencies of inter-stage matching network (input of stage 3).
Reprinted from [2]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3 Die microgrph of 40 nm CMOS PA. Reprinted from [2]. . . . . . . . . . 67
3.4 Measured s-parameters across digital gain states as well as the associated
gain/phase errors vs. frequency. Reprinted from [2]. . . . . . . . . . . . . 67
3.5 Measured large CW signal power sweep results over 27–30GHz (Pin;max =
 3.5dBm at all frequencies); CW Pout/PAE summaries at key power lev-
els over 26–33GHz. Reprinted from [2]. . . . . . . . . . . . . . . . . . . 68
3.6 EVM measurement setup. (a) Block diagram. (b) Characterization of
EVM floor over center frequency for the tested carrier aggregation wave-
forms. Worst-case EVM floor data measured by connecting SMW200A
directly to FSW43 using only a cable (2-2.5dB loss at 30GHz); i.e. with-
out CMOSDUT. At each center frequency, Pout of SMW200A is increased
until EVM is no longer noise-limited; then highest/worst EVM floor across
component carriers is recorded. Reprinted from [2]. . . . . . . . . . . . . 69
3.7 EVM/PAE vs. Pout at 27GHz for 64-QAM OFDM with 1,4,8CC and
Pout/PAE summaries vs. center frequency for  25dBc EVM; measured
spectrum/ACLR for peak 8CC performance at 27GHz: 4.32Gbps, +6.7dBm,
11% PAE. Reprinted from [2]. . . . . . . . . . . . . . . . . . . . . . . . 70
3.8 Summary of QPSK OFDM carrier aggregation measurements versus car-
rier frequency for  16 dBc EVM on each CC. (a) Average Pout. (b) Aver-
age PAE. Reprinted from [2]. . . . . . . . . . . . . . . . . . . . . . . . . 71
3.9 Summary of QPSK OFDM carrier aggregation measurements versus car-
rier frequency for  25 dBc EVM on each CC. (a) Average Pout. (b) Aver-
age PAE. Reprinted from [2]. . . . . . . . . . . . . . . . . . . . . . . . . 72
3.10 Summary of 64-QAM OFDM measurements versus carrier frequency for
a single CC having different contiguous RFBW values at  25 dBc EVM.
(a) Summary of average Pout. (b) Summary of PAE. Reprinted from [2]. . 73
xv
4.1 Impact of quantization/random phase step errors on array performance. (a)
Worst-case EVM degradation as a result of phase quantization error in dig-
ital PS. (b) Impact of random per-element phase errors on beam pointing
angle accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2 Effect of passive phase shifter insertion loss on overall transmitter EVM.
(a) Simulation scenario illustration. (b) 8-element phased array transmitter
EVM versus phase shifter insertion loss. . . . . . . . . . . . . . . . . . . 82
4.3 Effect of passive phase shifter insertion loss on overall receiver noise figure
and linearity. (a) Simulation scenario illustration. (b) 8-element phased
array receiver NF and IIP3 versus phase shifter insertion loss. . . . . . . 82
4.4 Die micrographs of stand-alone front-end module component test circuits.
(a) Power amplifier. (b) Low-noise amplifier. (c) Phase shifter 3. . . . . . 83
4.5 Schematic of three-stage power amplifier. (a) Top level block diagram and
relative stage scaling. (b) Stage 1: 42 dB cascode VGA. (c) Stages 2 and
3: common source stages with capacitive neutralization. . . . . . . . . . . 85
4.6 Schematic of three-stage LNA. (a) Top level block diagram. (b) Stage 1.
(c) Stage 2. (d) Stage 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.7 Schematic of three-bit phase shifter. (a) Block diagram. (b) 45o Cell. (c)
90o Cell (d) 180o Cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.8 Candidate topologies for antenna matching and transmit-receive switch.
(a) Concept of =4 transformer topology used in [3, 4]. (b) Transformer-
based multiplexer topology of [5]. (c) Transformer-based topology in [6].
(d) Proposed topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.9 Illustration of trade-off between Tx-mode bandwidth and Rx-modeNF in
design of PA balun for circuit of Fig. 4.8(d). (a) Configuring PA gain de-
vices to replace shunt switch in Rx-mode; VDD center-tap is pulled down
to ground to minimize gain device’s on-resistance Ron;PA. (b) Simpli-
fied -equivalent circuit in Rx-mode. (c) Smith chart trajectory of output
impedance ‘looking back’ at antenna from point B in Rx-mode; red arrow
indicates effect of tighter magnetic coupling in Tx balun. . . . . . . . . . 92
4.10 Antenna port matching and transmit-receive switch. (a) Schematic. (b) 3D
illustration of physical layout. . . . . . . . . . . . . . . . . . . . . . . . . 93
xvi
4.11 Illustration of signal path in the circuit of Fig. 4.10 for the Tx/Rx modes.
(a) Schematic in Tx-mode. (b) 3D structure in Tx-mode. (c) Schematic in
Rx-mode. (d) 3D structure in Rx-mode. . . . . . . . . . . . . . . . . . . 94
4.12 Die micrographs of passive test structures for antenna interface. (a) Tx-
mode: LNA port and TR switch shorted. (b) Rx-mode: PA port shorted,
TR switch gate tied to ground. . . . . . . . . . . . . . . . . . . . . . . . 96
4.13 Simulated and measured IL andRL of antenna interface passive test struc-
tures. (a) Tx-mode. (b) Rx-mode . . . . . . . . . . . . . . . . . . . . . . 97
4.14 Phase shifter port matching and transmit-receive switch. (a) Schematic.
(b) 3D illustration of physical layout. . . . . . . . . . . . . . . . . . . . . 97
4.15 Die micrograph of fully-integrated front-end module. . . . . . . . . . . . 98
4.16 Measured s-parameters in Tx-mode across 91dB PA gain steps for dif-
ferent PS phase state pairs. (a) States f0; 4g. (b) States f1; 5g. (c) States
f2; 6g. (d) States f3; 7g. . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.17 Measured gain step nonlinearity errors in Tx-mode across 91dB PA gain
steps for different PS phase states; r.m.s. error indicated with thick black
line in each case. (a) State 0. (b) State 1. (c) State 2. (d) State 3. . . . . . 100
4.18 Measured s-parameters in Tx-mode across 745o PS phase steps for dif-
ferent PA gain settings: (a) PA gain state 0 (min. gain). (b) PA gain state
4. (c) PA gain state 9 (max. gain). . . . . . . . . . . . . . . . . . . . . . 101
4.19 Measured errors in 745o PS phase steps in Tx-mode for different PA gain
settings; r.m.s. error indicated with thick black line in each case. (a) PA
gain state 0 (min. gain). (b) PA gain state 4. (c) PA gain state 9 (max. gain). 101
4.20 Measured CW power sweep results at maximum PA gain setting and PS
phase state 0 at different CW frequencies. (a) 27GHz. (b) 28GHz. (c)
29GHz. and (d) 30GHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.21 Summary of measured CW power sweep results at maximum PA gain set-
ting and PS phase state 0 versus CW frequency at key power back-off
levels. (a)Pout. (b) PAE. . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.22 Summary of measured Pout and PAE for carrier aggregation scenarios ver-
sus center frequency for EVM<  25dBc on each CC for different PS
digital states. (a) Pout for 1CC. (b) PAE for 1CC. (c) Pout for 8CC. (d)
PAE for 8CC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
xvii
4.23 Measured s-parameters in Rx-mode across 81dB LNA gain steps for dif-
ferent PS phase state pairs. (a) States f0; 4g. (b) States f1; 5g. (c) States
f2; 6g. (d) States f3; 7g. . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.24 Measured gain step nonlinearity errors in Rx-mode across 81dB LNA
gain steps for different PS phase states; r.m.s. error indicated with thick
black line in each case. (a) State 0. (b) State 1. (c) State 2. (d) State 3. . . 107
4.25 Measured s-parameters in Rx-mode across 745o PS phase steps for dif-
ferent LNA gain settings. (a) LNA gain state 0 (min. gain). (b) LNA gain
state 3. (c) LNA gain state 8 (max. gain). . . . . . . . . . . . . . . . . . . 107
4.26 Measured errors in 745o PS phase steps in Rx-mode for different LNA
gain settings; r.m.s. error indicated with thick black line in each case. (a)
LNA gain state 0 (min. gain). (b) LNA gain state 3. (c) LNA gain state 8
(max. gain). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.27 Measured Rx-mode noise figure versus frequency across all LNA gain set-
tings at PS phase state 0. . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.28 Summary of measured Rx-mode linearity performance versus frequency.
(a) CW input P1dB results at minimum and maximum LNA gain settings
and PS phase state 0 versus CW frequency. (b) Two-tone IIP3 results at
LNA gain settings f0; 1; 7; 8g and PS phase state 0 versus center frequency
of two-tone signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
xviii
LIST OF TABLES
TABLE Page
1.1 Definitions of conventional linear PA modes of operation. . . . . . . . . 3
1.2 Summary of recent literature on linear CMOS RF power amplifiers. . . . 9
1.3 Review of conventional transmit-receive switch literature. . . . . . . . . 12
1.4 Summary of passive LC delay cell based phase shifter literature review. . 13
2.1 Expressions and chosen values for various variables used in (2.1), Fig. 2.1,
and Fig. 2.2. Reprinted from [1]. . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Summary of targeted PA circuit specifications from [7]. Reprinted from [1]. 24
2.3 Comparison of 29GHz CW load-pull measurement results for single-ended
transistor test structures: (a) Ldeg = 0, and (b) Ldeg = 14pH single-ended.
Reprinted from [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Summary of design values for two-stage power amplifier. Reprinted from
[1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Comparison with state-of-the-art silicon mm-Wave PAs. Reprinted from
[1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1 Comparison with state-of-the-art linear mm-wave silicon PAs for data com-
munication. Reprinted from [2]. . . . . . . . . . . . . . . . . . . . . . . 74
4.1 Key specifications for circuit components of UE FEM and the correspond-
ing measured performances achieved by stand-alone test circuits in this
work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Comparison of candidate topologies in Fig. 4.8 for antenna matching and
transmit-receive switch. . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3 Comparison with state-of-the-art front-ends for 28 GHz 5G communications.110
xix
1. INTRODUCTION AND LITERATURE REVIEW
1.1 Motivation
The emerging fifth-generation (5G) wireless standard is expected to bring unprece-
dented increase in data rates that will enable new consumer and business applications that
rely on wireless technology. Examples are virtual reality (VR) and augmented reality
(AR), with their highly diverse set of sub-applications. The key technology behind the
sought improvement in data rate is highly integrated millimeter wave (mm-Wave) phased
arrays for communication on both the air interface accessed by the consumers through
their user equipment (UE) devices, and on the back-haul side where access point (AP)
devices first off-load data to be routed towards the core of the mobile network.
A fundamental challenge to the 5G vision is design of highly power-efficient UE
phased arrays that can cope with extremely broadband communication signals. Due to
market forces, the UE devices additionally must be implemented in low-cost consumer-
grade technologies, which makes their design even more challenging. Traditionally, mm-
Wave phased array radios have been restricted to military applications, e.g. airborne radar
units, and have been implemented in exotic compound semiconductor technologies. In
such military applications, the key drivers are performance and reliability, while cost
is less important. However, migration of military phased array development to silicon-
germanium (SiGe) technologies to reduce their cost and increase their integration level
is an emerging trend. While the more traditional microwave community insists that only
compound semiconductors can achieve the requirements of mm-Wave phased array, the
mentioned migration trend is behind several forward-lookin industrial developments of
SiGe solutions for 5G communication that aim to displace their III-V semiconductor com-
petition. The motivation of this research is to investigate the possibility of displacing
1
both compound semiconductors and SiGe solutions by using the even cheaper CMOS
technology for UE devices. The most challenging circuits to demonstrate are the radio
front-end components: power amplifier (PA), low-noise amplifier (LNA), phase shifter
(PS) and time-division duplexing (TDD) transmit-receive (TR) antenna switch. There-
fore, this research focuses on power- and area-efficient CMOS implementations of these
critical circuits and their integration into a phased array front-end module.
1.2 Power Amplifier
This section duscusses the fundamental operation of a power amplifier (PA), explains
the concept of load-line matching and how it differs from conjugate matching, then presents
a review of linear radio frequency (RF) PA literature, followed by challenges associated
with design of CMOS PAs at very high millimeter wave (mm-Wave) frequencies.
1.2.1 Classes of Operation
Figure 1.1 shows a single-transistor PA, operating in a linear mode, i.e. such that the
output signal ideally contains a significant component that is a linearly amplified version of
the input signal. The proportion of the RF signal cycle over which the transistor conducts
an output current iD is expressed as an angle '. The angle ' depends on whether the total
gate-to-source voltage VGS of the transistor exceeds its threshold voltage, and is controlled
by the value of the direct current (d.c.) bias voltage Vdc in Fig. 1.1. Table 1.1 shows the
definitions of th conventional linear modes of operation for PA design according to the
value of ' in degrees.
1.2.2 Load Line Matching
The concept of a load-line match is important in the design of PAs at any frequency of
operation. It marks clear deviation of large-signal circuit design from the simpler theories
used to treat small-signal circuits. The key point to understand is that a real transistor can
2
Figure 1.1: Illustration of fundamental single-transistor power amplifier operation.
Table 1.1: Definitions of conventional linear PA modes of operation.
only provide a finite output signal power. The upper bound on this power is the product
of the maximum voltage swing across the transistor’s drain–source terminals in Fig. 1.1
and its maximum drain current. The device simply cannot produce any more output power
than this product. The load-line match is the procedure by which the load impedance of
the transistor is synthesized to allow it to approach this upper bound on Pout. Since the
load impedance is by definision the ratio of output voltage to output current, the optimum
load resistance (also known as the "load-line resistance") is therefore defined as [8]:
3
Optimum resistance =
Maximum drain to source voltage swing
Maximum drain current
(1.1)
1.2.3 PA Linearity Metrics for Modulated Signals
The two key metrics for evaluating the linearity (i.e. fidelity) of an RF PA as it ampli-
fies a complex modulated signal are the error vector magnitude (EVM) and the adjacent
channel leakage ratio (ACLR).
1.2.3.1 Error Vector Magnitude
EVM is basically the inverse of the signal-to-noise ratio (SNR); but measured at the
output of the digital receiver. Figure 1.2 shows the ideal constellation points for a quadra-
ture phase shift keying (QPSK) signal (blue circles), as well as a received symbol with
some deviation or error vector relative to the ideal point. The root mean square (r.m.s.)
value of the error vector shown in Fig. 1.2 taken over an ensemble of received symbols is
the EVM of the signal. For PA design, a fictitious, ideal RF down-converter followed by
an ideal digital receiver are appended at the output of the RF PA in simulations to compute
the EVM that characterizes the linearity of the PA by as a stand-alone circuit without a
real receiver. EVM is the metric used to give an indication of in-channel signal quality.
Figure 1.2: Illustration of error vector.
4
1.2.3.2 Adjacent Channel Leakage Ratio
The ACLR is the ratio of transmit power ‘leaked’ into the adjacent channel relative
to the power within the desired channel. Figure 1.3 illustrates the computation of ACLR
for both the first adjacent upper (i.e. higher frequency) channel, ACLRU;1 and the first
adjacent lower (i.e. lower frequency) channel, ACLRL;1. The leaked power density is
integrated over the same bandwidth as the desired RF signal (RFBW in Fig. 1.3), but
centered at the first adjacent channel. Notice that there is typically a non-zero guard band
region that separates channels in wireless communication standards, which is indicated by
the small spacing between the integration regions highlighted in yellow.
RFBW
RFBWRFBW
ACLRU,1ACLRL,1
Figure 1.3: Illustration of adjacent channel leakage.
1.2.4 Linearity Benefits of Differential Operation
Sub-threshold biasing is classically associated with class-C PAs, which are highly ef-
ficient but unfortunately very nonlinear. A combination of three factors allows a sub-
threshold-biased PA to operate linearly in this work (in order of significance): differential
operation, gradual transistor d.c. cut-off, and limited transit frequency. This section dis-
cusses how these factors interact to yield favorable results in later chapters.
5
1.2.4.1 Differential operation
The PAs in this work use a pseudo-differential topology (also known as a push-pull
topology see e.g. Section 10.1 of [8]). Effectively, signal rectification does not take place,
as the differential pair allows the signal to be continuously trans-conducted from the input
to output, even if one of the two differential arms is in cut-off during half of the RF cycle
(for class-B bias). However, as shown by the conceptual waveforms in Fig. 1.4 for ideal
transconducting devices, CW input, and sub-threshold biasing: the output voltage across
the load resistor has ‘kinks’ that are responsible for the observed/expected distortion. If
these kinks are smoothed out, the amplifier nonlinearity is reduced.
‘Kinks’ in Vout
waveform are 
responsible for 
nonlinearity
VDD
VGS<Vthreshold
Figure 1.4: Illustration of continuous conduction of output signal by pseudo-differential
power amplifier action (also known as "push-pull" action).
6
1.2.4.2 Gradual d.c. cutoff
Note that, in this work, the term "sub-threshold biasing" is not used in its classic sense
frommicrowave PA design literature that typically assumes ideal/abrupt cut-off in the tran-
sistors. The difference between ideal/abrupt cut-off and real/gradual cut-off is illustrated
conceptually in the Fig. 1.5, showing a typical MOSFET d.c. ID-VGS characteristic in the
weak and moderate inversion regions near Vthreshold (see [9]):
VGS
ID ID-VGS
Near Threshold
Vthreshold
Abrupt cut-off
Gradual
cut-off
Figure 1.5: Graphical/visual definition of sub-threshold (i.e. weak inversion) conduction
as the gradual transition from cut-off to saturation.
Idealized/abrupt cut-off occurs exactly at the threshold voltage, while real/gradual cut-
off exhibits sub-threshold conduction over an extended d.c. VGS range near the threshold
voltage as shown. Also, throughout the manuscript, the intrinsic device transconductance
refers to the general definition: gm , @ID=@VGS , which is non-zero in the sub-threshold
region as shown by the non-zero slope of ID in the above conceptual diagram.
1.2.4.3 Finite Transit Frequency
The ‘switching’ speed of any MOSFET is a function of its transit frequency fT , and
fT is itself a function of the bias point [10]. At mm-Wave frequencies, fT of even a
7
hypothetical deep-submicron MOSFET having ideal/abrupt d.c. cut-off cannot be large
enough for the transistor to reach complete cut-off instantaneously when the input voltage
swings below the ideal/abrupt threshold voltage. Such an instantaneous jump to zero drain
current would require the device to have appreciable transconductance at the harmonic
frequencies of the waveform, which is not the case due to limited fT . This logic applies
even more soundly for sub-threshold biasing; where fT is reduced further.
The smoothing/filtering of unwanted distortion near the signal zero crossings in a push-
pull stage is a strong function of the bias point VGS; i.e. by the combined action of gradual
d.c. cut-off near the threshold on one hand, and the limited/controlled device fT that
lowers harmonic content on the other hand. Therefore, our encompassing optimization of
device width and bias point in Chapter 2 is responsible for the achieved class-AB back-
off linearity performance, and clearly does not yield class-C characteristics based on the
reported laboratory measurements.
1.2.5 Linear RF CMOS PA Literature
Table 1.2 shows a summary of recent literature on linear CMOS power amplifiers
at RF/mm-Wave frequencies. Since modulated signal performance is complex/expensive
to measure, it is not un-common for PA publications to use continuous wave (CW) sig-
nal performance metrics such as 1 dB gain compression point or peak amplitude-to-phase
(AM-PM) modulation conversion as a proxy. The cited references in Table 1.2 are chosen
either because they did report modulated data, or for being recent and therefore relevant.
This review is further augmented by the back-off PAE trend collection and commentary in
Chapter 2. It is important to point out that linear operation refers to inherent circuit-level
linearity in this dissertation, i.e. without added measures such as digital pre-distortion
(DPD). Note for example, that earlier works in deeply scaled CMOS technology did not
achieve comparable performance to that reported in later works including Chapters 2 and
8
3 of this work. This is the case even for works that use DPD, e.g. Cohen’09 in Ta-
ble 1.2, [11]. Also note that [12] is a good example of the state-of-the-art in current
fourth-generation (4G) long-term evolution (LTE) UE PAs; peak throughput supported is
320 MHz carrier aggregation. This is a testament of the immense potential for improve-
ment using mm-Wave technologies in 5G systems.
Table 1.2: Summary of recent literature on linear CMOS RF power amplifiers.
References: Elmala’06 [13], Chowd’09 [14], Cohen’09 [11], Zhao’13 [15], Thyag’14 [16], Kulkarni’14 [17], Zhao’15 [18],
Wanxin’15 [19], Larie’15 [20], Francois’15 [12]
*Driver amplifier consuming 150 mW not accounted for in reported PAE. **Digital pre-distortion is used to achieve reported linearity.
EVM degrades by  7 dB. ***Constellation order not given.
1.2.6 mm-Wave CMOS PA Design Challenges
Besides the typical challenges of any high frequency circuit design, silicon material-
system related limitations are a key challenge for RF/mm-Wave power generation in CMOS
9
technologies. That is, in comparison to e.g. the conventionally used III-V compound semi-
conductor technologies as explained below.
 In silicon, the critical (i.e. breakdown) electric field is 2.5105 V=cm, which is lower
than in III-V compounds. For example it 3105 V=cm in GaAs and 30105 V=cm
in GaN [21].
 Driven by market economics, and fundamentally limited by the low silicon break-
down field, technology scaling forces the supply voltage VDD to be lowered in
CMOS. This reduces the attainable output power of a CMOS PA operating at any
frequency.
 Lowering VDD increases the relative size of the ‘knee’ voltage Vd;sat, which in turn
degrades PAE.
 Threshold voltage Vt not equally scaled down with respect to VDD, forcing sub-
threshold operation. MOS device input capacitance is increasingly more nonlinear
below Vt [9].
 A high substrate conductivity is used in CMOS to avoid latch-up problems. This
causes especially large insertion losses in passive elements, especially inductors and
transformers.
1.3 Transmit-Receive Switch
1.3.1 MOS Switch Parasitics
 Series switch (ON-state): Ron determines low frequency insertion loss (forms poten-
tial divider with load). To reduce Ron of series switch, make Wswitch larger: larger
capacitance leads to signal feed-through (degrades isolation).
10
 Both series and shunt topologies (ON-state): junction capacitances couple signal
to bulk resistance RB at high frequencies, further increasing insertion loss. For a
given switch, there exists a value of RB for which power coupled to substrate is at
a maximum. Hence, very small or very large RB are both viable options. Without
triple well devices, high RB can lead to latch-up. One published solution is to use a
parallel LC-tank to float the bulk at a desired RF frequency.
 Shunt switch on Tx-side (OFF-state): Typically experiences large voltage swing due
to PA output. Large gate resistor RG (for D.C. path isolation) effectively floats gate
node. Equal Cgd and Cgs (overlap components) force gate voltage to ¡ drain voltage,
leads to self-biasing (‘boot-strapping’) and OFF-state resistance ROFF thus drops
with increasing Tx signal strength resulting in nonlinearity/gain compression.
1.3.2 Review of CMOS TR Switch Literature
The purpose of the present review is to illustrate the need for innovation in the design
of the antenna TR switch for this work; as containing insertion loss to within IL 1.5 dB
is highly desirable for PA efficiency and LNA noise figure. Achieving a reasonably high
input 1 dB compression point is also important if the switch topology appears in cascade
with the PA in transmit-mode as with conventional switch topologies.
Table 1.3 shows a summary of literature on conventional series, shunt, series-shunt,
and asymmetric transmit-receive (TR) antenna switch topologies. The data in Table 1.3
illustrates that conventional topologies yield more than 2 dB of insertion loss (IL) for
any publication above 20 GHz. Also, to the best of the author’s knowledge, the lowest
reported insertion loss IL of 1.6 dB achieved using a non conventional switchable balun
with a shunt-only TR switch topology in [22]. However, it suffers from poor linearity due
to the shunt switch being ‘boot-strapped’. The reported IP1 dB is +12 dBm. The same
concept of the switchable balun was later re-published in [23], but using stacked devices
11
to enhance transmit-mode linearity. The new design pushed IP1 dB up to+28 dBm, but the
insertion loss deteriorated to 3 dB, making it unattractive for this work. More recent, and
non-conventional design techniques for the TR switch from published high-performance
Rf/mm-Wave transceivers and front-ends are further considered in more detail in Chap-
ter 4.
Table 1.3: Review of conventional transmit-receive switch literature.
References: r1 [24], r2 [25], r3 [26], r4 [27],r5 [28], r6 [29], r7 [30], r8 [31], r9 [32].
*Triple-well process.
1.4 Passive Phase Shifter Literature
As will be shown in Chapter 4, a passive and bidirectional PS circuit is highly desirable
for low-power UE phased array front-ends. Table 1.4 specifically summarizes the literature
12
on passive PS circuits that use lumped-element cells to achieve linear-phase behavior over
a finite bandwidth. The key point of this review is that the average insertion loss per single
bit of resolution is 2.5 dB for any design operating at or above 20 GHz. That is, e.g. a 3-bit
resolution PS based on any of the publications in Table 1.4 is expected to show7.5 dB of
insertion loss. The techniques used in this work will be shown to achieve a lower insertion
loss in [33] as explained in Chapter 4.
Table 1.4: Summary of passive LC delay cell based phase shifter literature review.
References: Campbell_2000 [34], Hancock_2005 [35], Kang_2006 [36], Min_2008 [37], Cohen_2010 [38], Gharibdoust_2012 [39],
Li_2013 [40].
13
1.5 Dissertation Scope and Organization
Chapter 2 presents the first major project in this work; which resulted in the first re-
ported linear and efficient bulk CMOS PA targeting low-power 5G mobile user equipment
(UE) integrated phased array transceivers [1, 7]. The chapter begins with a link budget
analysis of UE phased array transmitter power consumption versus carrier frequency. This
analysis considers very detailed and practical circuit- and antenna-module-oriented limi-
tations, and serves as the back-bone for link-budget considerations throughout the disser-
tation. Then, an optimization methodology is proposed for the output stage of the PA with
the cost function being power added efficiency (PAE) at desired error vector magnitude
(EVM) and link range. Building on the optimization results, inductive source degener-
ation is employed to enable embedding of the optimized output stage into a two-stage
transformer-coupled PA. It is shown in [1] that carefully designed inductive degeneration
in the output stage is beneficial to the PA performance due to broadening of inter-stage
impedance matching bandwidth, and due to the positive contribution of this degeneration
to reduce distortion. The prototype PA demonstrating these concepts was designed and
fabricated in 1P7M 28nm bulk CMOS and used a 1V supply, and achieves state-of-the-art
performance.
The second project of this dissertation is presented in Chapter 3, and reported in [2,41].
The project focuses on a PA design that addresses the extremely challenging RF signal
bandwidth requirements of 5G, with the added challenge of integrating digital gain con-
trol for phased array functionality, e.g. magnitude tapering across elements for side-lobe
level control, or array element gain mismatch compensation. This second, three-stage
design overcomes the linearity limitation of the 28 nm PA in coping with wider signal
bandwidths that is investigated in Chapter 2, and simultaneously achieves higher power
gain. To achieve wideband linearity and higher gain without compromising the excel-
14
lent back-off PAE of the first design, loosely-coupled transformer matching networks with
dual in-band resonances at optimized frequency spacing were employed. Furthermore,
to decouple impedance matching and linearity performances from digital gain setting, a
current-steering cascode topology is used for the variable-gain first stage. This topology
results in the excellent broadband gain-step linearity performance.The measured through-
put for this design shows more than a three-fold improvement over the highest throughput
supportable by the state-of-the-art defined by the 28 nm PA of Chapter 2, [7]; but with
concurrent improvements in all of output power, PAE, and power gain.
For the third project of this dissertation, Chapter 4 describes the full details of the
transmit-receiver module considerations, as well as shows how the PA design of [2] and
the low-noise amplifier (LNA) and phase shifter (PS) designs in [33] are integrated to
form a high-performance, wideband, and low-power transmit-receive front-end module for
5G phased array UE time-division duplex (TDD) radios employing the RF phase shifting
architecture. Integrating this front-end module in a compact area without compromising
the excellent wideband performances of the individual circuit components is a major signal
integrity/electromagnetic design challenge that is tackled in this project. Area constraints
dictate that conventional, wideband distributed-element networks have to be completely
avoided, particularly in the TDD transmit-receive switch at the antenna port. A compact,
low-loss lumped-element topology for the critical antenna interface is developed to address
the UE requirements
Finally, Chapter 5 provides concluding remarks, as well as presents some avenues for
future research work that may be performed to follow-up on the results of this work.
15
2. A HIGHLY EFFICIENT AND LINEAR POWER AMPLIFIER FOR
28-GHZ 5G PHASED ARRAY RADIOS IN 28-NM CMOS*
2.1 Introduction
The race to deploy fifth generation (5G) wireless services by 2020 is on-going, and
mm-Wave technology will play a key role in meeting mounting demand for broadband
data traffic [42, 43].
While the spectral band to be adopted is not yet determined, recent advances make
the 28GHz band particularly interesting for 5G mobile standardization. Contrary to past
perception of mm-Wave propagation as a fundamental limitation [44], non-line-of-sight
(NLOS) 28GHz coverage was demonstrated in urban cells [45]. Also, to counter heavy
propagation losses, directive antenna arrays were integrated into base station and user
equipment (UE) form factors in commercial grade technologies [46, 47]. These advances
motivate favorable spectrum regulation [48].
Besides wave propagation, battery power efficiency for low-cost UE devices is another
critical 5G challenge, limited by integrated power amplifiers (PA). Maximizing data rate
gains implies broadband, e.g.  100MHz RF bandwidth, and spectrally efficient signal-
ing, e.g. high-order quadrature amplitude modulation (QAM), with orthogonal frequency
division multiplexing (OFDM). The large peak-to-average power ratios (PAPR) of these
signals and their sensitivity to distortion force the PA to operate in 8–10dB power back-
off and drastically lower its efficiency. Also, low cost and a high level of integration for
UE phased arrays make CMOS the technology of choice. Limitations from substrate con-
* Section 2 is reprinted with permission from S. Shakib, H. C. Park, J. Dunworth, V. Aparin and K. 
Entesari, "A Highly Efficient and Linear Power Amplifier for 28-GHz 5G Phased Array Radios in 28-nm 
CMOS," in IEEE Journal of Solid-State Circuits, vol. 51, no. 12, pp. 3020–3036, Dec. 2016. c2016 IEEE.
16
ductivity and silicon breakdown field degrade power efficiency in CMOS relative to e.g.
GaAs. CMOS devices also exhibit more gradual/softer gain compression, which further
increases the needed back-off to meet linearity requirements. Furthermore, in a high-
volume production setting, cost and complexity preclude the use of calibration, e.g. using
digital pre-distortion (DPD), due to differing nonlinear behavior among PAs in an inte-
grated array. Thus, 5G UE radios require efficient CMOS mm-Wave PAs having inherent
circuit-level linearity.
Early effort to experimentally assess/reach the achievable limit on power-added effi-
ciency (PAE) resulted in the first CMOS PA for 28GHz 5G that we reported in [7]. We
proposed an optimization methodology for selecting output stage transistor size/biasing
to maximize back-off PAE, given range and error vector magnitude (EVM) targets. We
also demonstrated that inductive degeneration can be used to broaden inter-stage match-
ing, and help to reduce distortion resulting from sub-threshold biasing. As a result of these
techniques, the achieved performance matched or exceeded the state-of-the-art in linear
silicon mm-Wave PAs, as represented by 60GHz works [15], [17], [20], and a SiGe PA for
28GHz 5G [49] (number of published 28GHz PAs is limited).
Expanding on the PAE trend observed in [7], this paper begins with a link budget anal-
ysis of phased array transmitter power consumption versus carrier frequency. The analy-
sis considers more detailed/realistic circuit- and antenna-module-oriented limitations not
considered by channel-propagation-oriented publications, e.g. [45]. From a 5G standard-
ization viewpoint, this helps to make a more informed choice of carrier band while in-
corporating the impact of UE power consumption. Subsequently, PA circuit requirements
based on this detailed analysis are derived. From this point, we turn to expanding on the
optimization methodology using target specifications in [7]. Brief theoretical analysis of
inductive degeneration in the output stage is used to further illustrate its benefits. The re-
ported experimental results are augmented with new measurement data, and discussed in
17
terms of the derived 5G requirements.
This paper is organized as follows. Section 2.2 shows 28GHz is favorable through
a detailed analysis of phased array transmitter power consumption across a wide carrier
frequency range encompassing candidate 5G bands, and provides PA circuit specifica-
tions. The design optimization methodology, and output stage inductive degeneration as
its circuit-level enabler reported in [7], are expanded upon in Section 2.3 and Section 2.4,
respectively. Section 2.5 provides implementation details. In Section 2.6, experimental
data is reported and compared to the state-of-the-art, as well as to the derived 5G require-
ments from Section 2.2. The paper is concluded in Section 2.7.
2.2 System Considerations for 5G Phased Array Radios
The link budget for the envisioned phased array 5G broadband communication system
is analyzed to compare a wide range of potential carrier frequencies, with UE transmitter
battery power consumption PTx;dc as the figure of merit. The 28GHz band is shown to
be favorable, and PA output power requirements are derived for future 28GHz 5G phased
array PA developments.
2.2.1 Choice of Carrier Frequency for 5G Systems
The chosen use scenario for this analysis, and signal loss mechanisms in the UE to
access point (AP) direction are illustrated by Fig. 2.1. Assuming a line-of-sight (LOS)
channel simplifies this analysis without affecting fc comparison. NLOS channel details
can be found elsewhere [50]. Low-cost CMOS technology is assumed for the UE phased
array RFIC (AP RFIC may be e.g. SOI), and flip-chip bonding to printed circuit board
(PCB) antenna arrays is assumed [51]. Patch antennas are arranged at each carrier fre-
quency fc into an Nx  Ny uniform rectangular array (URA) to fit on a UE/AP PCB of
physical dimensions dxdy, fixed across fc values (26mm15mm for UE, 47mm47mm
for AP), at a spacing of 0:50, where 0 is free space wavelength. The UE/AP URA is
18
PCB Antenna Array
RFIC
Antenna array substrate:
h=100 m, r=3.6
Φ
Φ
Φ
Φ
Φ
Φ
LFE,Tx=Lsw+Lbump+Lvia+Lfeed LFE,Rx=Lfeed+Lvia+Lbump+Lsw
D
AP
Phased Array
RFIC
Line-of-sight
Channel
RFFE
RFFE
RFFE
RFFE
RFFE
RFFE
UE PCB
Antenna Array
AP PCB
Antenna Array
1
2
Nx,TxNy,Tx
1
2
Nx,RxNy,Rx
etc
UE
Phased Array
CMOS RFIC
etc
Beam
misalignment
angle: Ψ
Ψ
Flip-chip
interconnection
h
Figure 2.1: Illustration of link budget analysis use scenario in comparison of potential
carrier frequencies for 5G systems. Reprinted from [1].
illustrated at arbitrary fc in Fig 2.2(a). Element counts vs. fc, and an example UE array
at fc =30GHz are shown in Fig 2.2(b). Fixing dx  dy reflects practical size constraints,
and helps compare fc values fairly; as array gain Garray (fc) significantly impacts the link
budget. To find PTx;dc (fc), using Friis’ equation [52] to first express PTx;rf (fc):
PTx;rf (fc) =
Required Rx Signal for Reliable Detection SRxz }| {
10 log10
 
kBT  103 BWsig

+NFRx (fc) + SNRsig + Lpath (fc)
+ Lmisc + [LFE;Tx (fc) + LFE;Rx (fc)]| {z }
RF Front end Losses
  [Garray;Tx (fc) +Garray;Rx (fc)]| {z }
Tx=Rx Antenna Array Gains w:r:t Isotropic Element
(2.1)
where kB is the Boltzmann constant, T is absolute temperature, and PTx;rf (fc) is the
total RF output power of the UE transmit array in dBm, needed for reliable detection.
Table 2.1 lists definitions of the remaining variables, and corresponding explanations for
their chosen values used in (2.1), Fig. 2.1, and Fig 2.2. The general criterion is to represent
the highest proven capabilities for system components from published literature across an
19
fc of 2.4–83GHz.
0
.5
0
0
.5
e
ff 0.7 eff
0.25 eff
0
.2
5
e
ff
0.25 eff
0
.2
5
e
ff
 dx
d y
(1,1)
(1,2)
(1,Ny) (Nx,Ny)
(Nx,1)(2,1)
(2,2)
(2,Ny)
(Nx,2)
0.5 0
(a)
E
le
m
e
n
ts
 i
n
 A
P
 A
rr
a
y
dx=26mm
(1,1) (1,2) (1,3) (1,4)
(2,1) (2,2) (2,3) (2,4)
d
y=
15
m
m
(b)
Figure 2.2: Two-dimensional URA of UE or AP antennas for allowable physical size
dx  dy of 26mm  15mm for UE array, 47mm  47mm for AP array: (a) UE or AP
at arbitrary fc , and (b) number of antenna elements in UE (left axis) and AP (right axis)
arrays; inset shows example for UE at fc=30GHz. Reprinted from [1].
In [7], and using the best published CMOS back-off PAE data in [17,18,20,49,57–60],
we previously approximated PAE at Pout that satisfies jEVMj = SNRsig + 3dB = 25dB
using 64-QAM OFDM by PAE at Psat   9:6dB, then fitted the data to the trendline:
PAE (fc)  35%h
1 + 0:16
p
(fc=109)
i2 : (2.2)
Also in [7], and consistently throughout this paper, we define Psat as the Pout at 3dB
of gain compression for the presented 28nm CMOS PA (see Section 2.6.1). Combining
(2.1), (2.2), Fig. 2.1, Fig 2.2, and Table 2.1, the data scatter and corresponding trendline
for PTx;dc (fc) = PTx;rf (fc) =PAE (fc) based on the best published performances across
a wide fc range is shown in Fig. 2.3(a). For fixed link range, modulation format, RF
20
Table 2.1: Expressions and chosen values for various variables used in (2.1), Fig. 2.1, and
Fig. 2.2. Reprinted from [1].
bandwidth, and physical antenna area, Fig. 2.3(a) shows the 28GHz band provides power
savings over 5–6GHz despite lower transmitter PAE and higher propagation loss. To a first
order, this may be understood in the blue shaded region in Fig. 2.3(a) from the frequency
dependence of PAE in (2.2), and of path loss Lpath;A and antenna array gain Garray;A (see
21
Fig. 2.2) expressed as absolute ratios:
~1/fc
Large
NFRx and LFE
(a)
(b) (c)
Figure 2.3: (a) Scatter of best published data and fitted trendlines for PAE (fc) and
PTx;dc (fc), (b) required average Pout per element vs. fc for 64-QAM at different BWsig,
and (c) required average Pout per element vs. LOS range at 30GHz for QPSK and 64-QAM
at different BWsig. Reprinted from [1].
22
PTx;rf (fc)
PAE (fc)
/

Lpath;A (fc)
Garray;Tx;A (fc)Garray;Rx;A (fc)

 1
PAE (fc)


f 2c
f 2c  f 2c

 1
(1=fc)
 1
fc
:
(2.3)
Relation (2.3) highlights the impact at mm-Wave frequencies of using directive arrays
in both UE and AP to avoid path loss incurred at lower fc values; where size constraints
dictate fewer antenna elements. That is, the combined transmitter and receiver antenna
array gains for 2.4–15GHz in Fig. 2.3(a) is insufficient to compensate for the negative
effects of increasing Lpath and decreasing PAE on PTx;dc. Also, the red shaded region in
Fig. 2.3(a) shows that LFE and NFRx in (2.1) may limit this benefit of arrays at higher
mm-Wave frequencies. Therefore, the above semi-empirical analysis shows the 28GHz
band is a viable choice for wideband, low-power 5G systems.
2.2.2 Output Power Requirements
The most demanding anticipated uplink scenario uses a 250MHz-wide 64-QAMOFDM
signal. As part of on-going 28GHz 5G developments, the early effort in [7] aimed to un-
derstand practical transmission range and PTx;DC limits for this very challenging case.
Considering realistic size limitations and signal losses as in the above analysis, Fig. 2.3(b)
shows required average Pout at 30GHz per UE element in the URA of Fig. 2.2(b) to achieve
10–150m LOS range. A 20–30m LOS range requires an average Pout of 5–7.5dBm for a
250MHz-wide, 64-QAM signal. Alternatively, using a 16-element UE array as assumed
in [45] translates in the above analysis to50m LOS range at average Pout 7dBm. Low-
power 5G use cases requiring longer range can employ lower modulation orders down to
e.g. QPSK as shown in Fig. 2.2(b).
A point worth noting is that EVM, i.e. in-channel signal quality, is the main linearity
specification that needs to be met by PA design for 5G phased arrays. Adjacent channel
23
power ratio (ACPR) is less important due to higher spatial selectivity at mm-wave frequen-
cies relative to the low-GHz range [21]. In a sub-6GHz system, adjacent channel leakage
power of one user can block (degrade sensitivity to) the received signal from an adjacent-
channel user because the AP antenna cannot spatially separate users. On the other hand, in
the current context of mm-wave phased arrays, users with large enough spatial separation
are served by separate beams from an AP. Typically, there is enough side-lobe attenuation
in the AP antenna array (at least 10 dB) to prevent one user from blocking another. If the
coverage sector of two users coincides so they are both served by the same beam from an
AP, the interference can be mitigated by reducing bandwidth allocated to each user and/or
separating their carrier frequencies. Statistically, such a coincidence may be a rare event,
but this should be confirmed at the system level using, e.g. simulations.
The remainder of this paper expands on concepts and experimental results in [7] based
on the PA specifications in Table 2.2.
Table 2.2: Summary of targeted PA circuit specifications from [7]. Reprinted from [1].
24
2.3 Output Stage Optimization Methodology
The output stage should dominate overall linearity and efficiency by design, so it is
carefully optimized. Using a parameterized output stage, the goal of this section is to
determine transistor size and bias point to maximize PAE for the Pout;req and EVMreq
requirements in Table 2.2.
2.3.1 Parameterized Output Stage
Gate
(UTM)
Drain
(UTM)
Source (M6) Body (M1)
Source (M6) Body (M1)
(a)
Ideal
Balun
D
C
+
-
Cn
Cn
WnMOS
WnMOS
VGS VDD
Ideal
Balun
D
C
+
-
(b)
Figure 2.4: Parameterized output stage circuit for optimization: (a) power cell layout, (b)
parameterized output stage circuit. Reprinted from [1].
The power cell layout in [15] was adapted to the 1P7M 28nm CMOS process; unit cell
size isWunit=Lunit = 32 ngers  1m=28nm, and its layout is illustrated in Fig. 2.4(a).
Each scalable nMOS device in the neutralized push-pull stage of Fig. 2.4(b) is constructed
as m the RC-extracted unit cell, so WnMOS = m Wunit. Cn is chosen to maximize
reverse isolation (i.e. minimize s12) of the core stage for greatest stability as in e.g. [15].
Ideal baluns and single L-section LC-tuners present variable differential mode termina-
tions at the fundamental frequency. Higher odd-order harmonic terminations were uncon-
strained during optimization for consistency of simulation with the limited degrees of free-
dom in subsequent transformer-based implementation. That is, low-order networks were
25
acceptable in this mm-Wave design to minimize insertion loss at the fundamental. The
common mode at input (output) is terminated in an ideal voltage source for gate-to-source
(drain-to-source) d.c. bias VGS (VDD), i.e. an a.c. short in simulation. For termination
of the higher even-order harmonics in the implementation, large on- and off-chip bypass
capacitors are used to approximate this simulated a.c. short at the supply node. Gate
bias node termination is discussed further in Section 2.6.4. Matching network insertion
losses are expected to vary with (WnMOS; VGS) due to the varying impedance transforma-
tion ratios needed. Correctly modeling matching network losses across the design space
is challenging; since electromagnetic (EM) simulations are needed, while simpler mod-
els can be inaccurate/misleading. Therefore, to avoid ambiguous selection criteria for
(WnMOS; VGS), LC-tuner loss is omitted. Finally, bulk and source terminals are grounded.
2.3.2 Optimization Procedure
With WnMOS (i.e. m) and VGS as the two independent variables, overlaid contours
of average Pout and of PAE, plotted at fixed EVM, form the output stage transistor de-
sign chart. The chart is created using two steps at each (WnMOS; VGS), corresponding to
Fig. 2.5(a) and (b), respectively:
1. Load-pull: Optimal termination  L;opt (WnMOS; VGS) is defined to maximize PAE
at Pout;req. A continuous wave (CW) signal at Pin (VGS) = Pout;req   Gmax (VGS)
makes Pout  Pout;req, whereGmax (VGS) is the maximum available gain of the stage
(independent of m). This approximate enforcement of the desired output power
during load-pull simulation results in some degradation of PAE in the next step,
which is accepted to simplify the procedure.
2. EVM Simulation: A behavioral amplitude-to-amplitude (AM-AM) and amplitude-
to-phase (AM-PM) modulation conversion model for the terminated stage is ex-
tracted from CW signal input power sweep simulation [61]. A sweep of 64-QAM
26
(a)
(b)
Figure 2.5: The two steps in the optimization procedure: (a) Step (1) - load-pull, (b) Step
(2) - EVM simulation. Reprinted from [1].
OFDM Pin to this model generates PAE/EVM versus Pout characteristics. Note that
this modeling approach ignores potentially relevant circuit-level memory effects as
explained in Section 2.6.
By interpolating PAE/EVM characteristics, the design chart is plotted in Fig. 2.6 for
EVM =  27dBc (2dB margin from EVMreq). EVM versus Pout slope is 2dB/1dB if third
order intermodulation IM3 dominates; 2dB margin maintains average Pout at EVMreq af-
ter adding 1dB loss in realization of  L;opt. Balun measurements confirmed 1dB loss is
reasonable (Section 2.5.2).
27
Figure 2.6: Output stage transistor design chart for VDD = 1V: PAE (shaded) and average
64-QAM OFDM Pout (line) contours plotted at an EVM of  27dBc, i.e. at a 2dB mar-
gin from EVMreq; design choice indicated by a circle, and inset shows correspondence
between VGS   Vt on x-axis and bias current density JPA. Reprinted from [1].
2.3.3 Optimization Results
From Fig. 2.6, PAE (shaded) decreases with increasing VGS Vt as would be expected.
For Pout contours (lines), many effects interact to produce the behavior in Fig. 2.6. With
this complexity in mind, two limiting scenarios are identified, and their intuitive interpre-
tations are offered below:
 For fixed WnMOS, as VGS   Vt increases, Pout contours approach being horizon-
tal. Horizontal contours indicate Pout is limited by current clipping at constant
WnMOS. Given a limited impedance transformation ratio in the output match re-
alization, and without a guide like Fig. 2.6, one conventionally chooses minimum
WnMOS for Psat  Pout;req + PAPR; to avoid current clipping and indirectly satisfy
EVMreq. PAE is suboptimal over VGS   Vt =100–200mV, where approximately
horizontal Pout contours justify the conventional choice.
28
 For fixed VGS Vt, asWnMOS increases, Pout contours become increasingly vertical,
indicating Pout no longer increases. LargerWnMOS corresponds to larger (nonlinear)
intrinsic gate capacitance, and results in greater AM-PM conversion [13, 62], i.e.
lower Pout. Also, a small, class-C-like conduction angle limits Pout through limiting
Psat if VGS   Vt is small.
A reasonable compromise between Pout/PAE of 6.5dBm/25% is reached by setting
m = 12 and VGS Vt =  150mV. The region surrounding this design point is highlighted
with a circle in Fig. 2.6 However, a device of WnMOS = 384m in sub-threshold has a
large ( 330fF) and strongly nonlinear gate-to-source capacitance Cgs [9]. Furthermore,
neutralization increases unloaded quality factor of device input impedance Qu  40 [63],
reducing attainable inter-stage matching bandwidthBWint as the Bode-Fano limit dictates
[52]. Small BWint is undesirable for two reasons. First, it increases sensitivity to PVT
and to mm-Wave modeling accuracy in the cascaded amplifier (Fig. 2.7(a)), e.g. lower
gain through relative detuning among stages. Second, excessive AM-PM conversion and
driver stage load modulation both occur with increasing Pout as the center frequency of
inter-stage matching shifts to lower values due to Cgs nonlinearity (Fig. 2.7(b)). Proposed
in [7], inductive degeneration in the output stage mitigates the mentioned issues that result
from sub-threshold bias at a current density on the order of 10A=m using a relatively
small source inductance. A seemingly similar use of the technique by [64] was different
from this work, as the purpose there was to boost P1dB by countering AM-AM conversion
for class-A-like biasing at a much higher 250A=m current density.
2.4 Inter-stage Impedance Matching
In this section, inductive degeneration is shown to broaden BWint and thus enable em-
bedding of the optimized output stage of Section 2.3 in a cascaded mm-Wave transformer-
coupled PA.
29
(a)
(b)
Figure 2.7: Issues in using m = 12, VGS   Vt =  150mV optimization result: (a) Cas-
caded amplifier frequency response overly sensitive to PVT and modeling accuracy, (b)
AM-PM conversion and DA stage load modulation due to Cgs nonlinearity. Reprinted
from [1].
2.4.1 Physical Circuit Operation
Single-tuned transformer-based inter-stage matching is analyzed in differential mode
starting from the single-ended equivalent in Fig. 2.8(a), and simplified further in Fig. 2.8(b)
and (c). The circuit in Fig. 2.8(c) is a T -model of two magnetically-coupledLC-resonators
[65], i.e. L1;2 with coupling coefficient km, and capacitors C1;2, having two unloaded
30
(a)
(b)
(c)
Figure 2.8: Single-ended model of differential-mode inter-stage matching: (a) circuit, (b)
1st simplification (c) simplified model. Reprinted from [1].
resonance frequencies [66]:
!1;2 =
p
2r
(L1C1 + L2C2)
q
(L1C1   L2C2)2 + 4C1C2L2m
; (2.4)
where km = Lm=
p
L1L2 = k, Lm = Lmag, L1 = Lleak + Lmag, L2 = Lser + Lmag, C1 =
CD, and C2 = Cgs. jkj is the ratio of coupled to stored magnetic energy in the resonators,
and tighter coupling increases! , !1 !2, [66]. For k & 0:75, series Lleak and Lser are
31
relatively small, !2  !1, so in-band behavior resembles parallel resonance at !0  !1.
As will be shown mathematically in Section IV-B, in this range of k & 0:75, BWint
can be increased by reducing k, or by increasing series resistive loading. Reducing k is
avoided because it only weakly increasesBWint, and because it also increases transformer
insertion loss [67]. Emulating resistive loading by using Ldeg is analyzed in the next
section.
2.4.2 Driver Load Impedance
The input impedance to the right of a point X (X=A–E) in Fig. 2.8(c) is denoted ZX ,
e.g. the input impedance at the gate of the transistor is:
ZA  Rser + (gmLdeg=Cgs) + sLdeg + 1=sCgs; (2.5)
where Rser is the resistance in series with the gate. Inductive degeneration emulates a
resistor (gmLdeg=Cgs) without dissipating power, where gm and Ldeg are transconduc-
tance and degeneration inductance, respectively [10]. The driver stage load ZE can be ex-
pressed as a rational function, and its denominator asD (s) = [(s2=!21) + (s=!1Q1) + 1] 
[(s2=!22) + (s=!2Q2) + 1] to reflect the dual-resonance nature of the circuit. The resonant
frequencies !1;2 are defined in (2.4), and Q1;2 are their associated quality factors. For the
relevant in-band resonance:
Q1 =
(!21=!
2
2   1)
[!21CD (Lleak + Lmag)  1]


1
!1 (CgsRser + gmLdeg)

: (2.6)
With insight from Section 2.4.1, and since Q1 / 1=Ldeg in (2.6), Ldeg broadens BWint;
i.e. lowers sensitivity of ZE to component tolerances and nonlinear Cgs variation with
signal power.
32
As an example, starting from a differential load Cgs = 165fF (i.e. 330/2) of Qu  40
so differential Rser  0:9
, and targeting ZE = (100 + j0)
, the Smith chart matching
trajectory at resonance (28GHz) is shown in Fig. 2.9(a) for Ldeg = 0 , and in Fig. 2.9(b) for
Ldeg = 28pH (i.e.142). In Fig. 2.9(a), greater insertion loss is expected (smaller Lmag),
and larger silicon area (much larger Lser), compared to Fig. 2.9(b). Using a complete
dual-resonance expression for ZE , independent 10% Gaussian variations in Cgs and CD
result in the ZE scatter plots in Fig. 2.9(c) and (d), corresponding to Fig. 2.9(a) and (b),
respectively. ZE is less sensitive for Ldeg = 28pH.
(a) (b) (c) (d)
Figure 2.9: Smith chart trajectories for inter-stage matching to present (100 + j0)
 dif-
ferentially to DA: (a) Ldeg = 0, (b) Ldeg=28pH (i.e. 14pH single-ended), and scatter of ZE
due to independent Gaussian variations (3  10%) inCgs andCD; 13dB return loss
region w.r.t (100 + j0)
 target indicated with circle: (c) Ldeg = 0, and (d) Ldeg=28pH.
Reprinted from [1].
2.4.3 Effect of Ldeg on Power Capability
To verify that inductive degeneration does not adversely affect transistor power ca-
pability, the two single-ended test transistors with and without Ldeg in the die photos of
Fig.2.10(a) and (b), respectively, were fabricated on the same test chip as the PA presented
in this paper. The source node of the single-ended device in Fig.2.10(b) is connected
33
(a) (b)
Figure 2.10: Micrographs of 12 32 1m=28nm transistor test structures used in power
capability verification: (a) Ldeg = 0, (b) Ldeg=14pH single-ended. Reprinted from [1].
Table 2.3: Comparison of 29GHz CW load-pull measurement results for single-ended
transistor test structures: (a) Ldeg = 0, and (b) Ldeg = 14pH single-ended. Reprinted
from [1].
to one terminal of the degeneration inductance (7m-wide slab on the ultra-thick metal
layer), and a wide, stacked metal mesh is connected to the other terminal of the induc-
tor. The device in Fig.2.10(a) does not have any source inductor, so its source node is
directly connected to the ground mesh. Load-pull measurements using a 29GHz CW sig-
nal were carried out, and the results are summarized in Table 2.3 for 1dB compression.
A slightly more inductive load impedance is used for the Ldeg = 0 case. The chosen
14pH of inductive degeneration lowered device power gain at +12dBm Pout from 10dB
to 8dB, and reduced its PAE at the same +12dBm Pout from 48% to 44%. The observed
2dB drop in power gain contributes to this PAE degradation. Extra care was exercised to
34
minimize the ground path impedance for the test device of Fig. 2.10(b) by using a wide
and stacked metal mesh surrounding the device for grounding. However, it is still possi-
ble that unwanted loss resistance in series with the 14pH Ldeg also contributes to the PAE
degradation. The chosen Ldeg did not degrade power capability in any significant way.
2.4.4 Effect of Ldeg on Gain and Distortion
First, the AM-PM due to the inter-stage matching by itself is briefly studied. Trans-
former voltage gain Av (!) , VA=VE = jA (!)j ej'(!) contributes AM-PM conversion
due to shifting !0 with signal level (see Fig. 2.7(b), and [13, 62]). Mathematically, this
contribution is / (d'=dVA) 
 
d'=d!j!0
  (d!0=dCgs)  (dCgs=dVA), where the chain
rule of derivatives was used. Resonator quality factor is defined as 1
2
!0
 
d'=d!j!0

[68],
and therefore Ldeg reduces AM-PM by lowering the in-band quality factor of Av, which is
identical to Q1 in (2.6).
We now turn to discuss the effects of Ldeg on gain, and on the overall linearity of the
output stage, including the inter-stage matching. It is well-known that negative feedback
has the beneficial effect of reducing distortion, and the adverse effect of reducing gain, by
factors that increase with the associated loop gain [10]. Ldeg is therefore expected to con-
tribute to linearizing output stage AM-AM response. Additionally, Volterra series analysis
of a differential class-AB bipolar stage [69] suggests Ldeg can reduce AM-PM conversion
in the effective transconductance of the stage. On the other hand, the inverse relation be-
tween power gain and size of source degeneration inductance observed in Table 2.3 has
been analyzed in e.g. [21, 70]. Thus, the loop gain of the negative feedback by Ldeg in
this work cannot be chosen large to minimize distortion, since it is very important to min-
imize gain degradation, and hence minimize back-off PAE degradation in the output stage
(gm  !0Ldeg . 0:2 for quiescent bias point).
To illustrate the improvement in linearity due to Ldeg, the AM-AM/AM-PM response
35
of the output stage is simulated for different Ldeg values in the set f0; 5; 15; 25; 50gpH
to cover zero, small, moderate, and heavy degeneration. The differential version of the
inter-stage matching network topology of Fig. 2.8 is used, with ideal/lossless components.
No losses were included in the matching components to avoid influencing effective Rser
and therefore Q1 in (2.6). The input source has 120
 output impedance, and it models
the driver stage signal at the same 28GHz frequency. Note that the AM-AM/AM-PM con-
version characteristic is sensitive to any slight detuning between input CW frequency and
the matching center frequency for lossless components, so the matching network is re-
designed at each Ldeg to maintain 30dB return loss relative to 120
 with a fixed 28GHz
center frequency. Loss of the output match, and RC-extracted transistor layout are in-
cluded in the simulation, and results are shown in Fig. 2.11.
It can be seen in Fig. 2.11 that even the smallest degeneration of 5pH contributes to
improvement in AM-AM and AM-PM: the gain expansion is reduced by 0.7dB and the
sharp increase in lagging AM-PM is slowed down by >10 degrees at the input power
for 3dB of compression relative to small-signal gain. AM-AM/AM-PM conversion are
also significantly reduced with larger Ldeg, until the improvement saturates at the largest
simulated value. At 50pH, the gain expansion is 0.2dB and the AM-PM is <1 degree up
to 3dB compression. In this sweep of Ldeg, the small-signal gain of the output stage drops
from 17.5dB at Ldeg = 0, to 5.8dB at Ldeg = 50pH, while at 15pH, the small signal gain
of the stage is 10.3dB. The design value is 14pH and the gain is 11dB so that additional
gain degradation cannot be tolerated to further improve linearity.
2.5 Circuit Implementation
Using the developed concepts, a 28GHz two-stage transformer-coupled PA is designed
in a 1P7M 28nm CMOS technology, having ultra-thick metal (copper, UTM) and redis-
tribution (aluminum, RDL) layers. The circuit is shown in Fig. 2.12(a): both stages use
36
P
in
(dBm)
-25 -20 -15 -10 -5 0 5 10 15
A
M
-A
M
 (
d
B
)
-6
-5
-4
-3
-2
-1
0
1
2
L
deg
=0pH
L
deg
=5pH
L
deg
=15pH
L
deg
=25pH
L
deg
=50pH
(a)
P
in
(dBm)
-25 -20 -15 -10 -5 0 5 10 15
A
M
-P
M
 (
d
e
g
re
e
s
)
-30
-25
-20
-15
-10
-5
0
5
L
deg
=0pH
L
deg
=5pH
L
deg
=15pH
L
deg
=25pH
L
deg
=50pH
(b)
Figure 2.11: Simulated AM-AM/AM-PM response of the output stage using re-designed
lossless input matching for each Ldeg 2 f0; 5; 15; 25; 50g]pH atWnMOS = 12 32 1m
and JPA = 12A=m: (a) AM-AM response, and (b) AM-PM response. Reprinted
from [1].
37
GG
S
G
G
S
W
nMOS
/2 W
nMOS
PADA
V
G
,P
A
V
G
,D
A
XF
Cn Cn
LdegLdeg
io+ io-
vi+vi-
PA or DA
v
i-
v
i+
io+
io-
(a)
PA
L d
eg
L
deg
G
round
P
lane
C
n
C
n
vi+
vi-
io+
io-
(b)
Figure 2.12: Schematic of PA circuit: (a) two-stage transformer-coupled topology, (b)
push-pull stage with capactive neutralization capacitor Cn and single-ended source degen-
eration inductor Ldeg. Reprinted from [1].
the same topology, but different criteria for parameter/element values. Matching network
design is based on single-tuned transformers. Table 2.4 gives a summary of the key design
values, while concepts unique to this implementation relative to published considerations
38
Table 2.4: Summary of design values for two-stage power amplifier. Reprinted from [1].
for high-efficiency mm-Wave PA layout [14, 15] are mentioned next.
2.5.1 Core Stages
Starting from the basic circuit of Fig. 2.4(b), with optimized loading, width, and bias
from simulations in Section 2.3, introducing Ldeg as in Section 2.4 completes the PA stage.
Prioritizing PAE, Ldeg = 14pH is just enough to effectively widen BWint of inter-stage
matching (Fig. 2.9) and to help reduce distortion. Adding Ldeg results in some minor shift
of the optimumWnMOS, VGS , and  L;opt relative to their chosen values in Section 2.3, but
a re-design was not attempted. To realize this small Ldeg = 14pH with minimal series
resistance, and to comply with current density rules, the structure is drawn as two 7m-
wide slabs. The two UTM slabs form the yellow V-shape in the 3D model of Fig. 2.12(b).
To account for added magnetic and capacitive coupling, electromagnetic (EM) simulations
for Ldeg design include both gate and source routing in close proximity (gray traces and
red ‘forks’ in Fig. 2.12(b), respectively). The two lowest metal layers M1 and M2 form a
stacked ground plane. Ldeg conducts the differential mode currents, and has its center tap
tied to M1–M2. Thus, the M1–M2 plane provides a predictable path/impedance for the
39
common mode, minimizes parasitic d.c. voltage drop, and grounds transistor bulks.
Metal-oxide-metal (MOM) capacitors with a nominal value that maximizes core re-
verse isolation are used to implement Cn in Fig. 2.12(a). The layout uses 1m-long fin-
gers tied to a tapered manifold, and reduced substrate doping density below the capacitor
(native layer), to help reduce the series and parallel resistive losses, respectively. The
measured capacitance and quality factor of Cn in the PA stage are shown vs. frequency in
Fig. 2.13(a). The nominal design value of Cn is 67fF, while the measured value is 64fF
from Fig. 2.13(a). Also, Fig. 2.13(b) shows that measured Q translates to >1.5K
 of
shunt-equivalent resistance up to 40GHz. Therefore, the impact of the measured capacitor
loss on Pout/PAE is not major.
Driver amplifier (DA) design targets are sufficient power gain and minimal influence
on cascaded amplifier linearity. Accordingly, DA transistor width is half that in PA stage
to avoid DA-limited saturation. Since the PA stage is biased in sub-threshold, class-A-like
biasing must be avoided in the DA stage to maintain back-off PAE, degrading DA gain.
Further degradation of gain results from a relatively large Ldeg required for input matching
to the 50
 driving impedance dictated by on-wafer probe testing. On the other hand, two
measures help to partially recover DA gain. First, although still in sub-threshold, a bias
current density of JDA  2 JPA is chosen to increase gain, where JPA (JDA) is the cur-
rent density in the PA (DA) stage. Second, the targeted shunt load resistance at resonance
is 120
 differentially for the DA, i.e. > 249
, where 49
 is the corresponding value for
the PA stage. Some additional gain improvement is possible if the amplifier is integrated
into a front-end. In this case, an on-chip circuit drives the DA, so a driving impedance
>50
 could be chosen. Finally, layout considerations described for the PA stage apply to
the DA stage, but the larger Ldeg is realized by a differential two-turn spiral.
40
Frequency (GHz)
12 16 20 24 28 32 36 40
M
e
a
s
u
re
d
 C
n
(f
F
)
20
25
30
35
40
45
50
55
60
65
70
C
n
Design Value Nominally 67fF
M
e
a
s
u
re
d
 Q
20
25
30
35
40
45
50
55
60
65
70
C
n
Q
(a)
(b)
Figure 2.13: Measured MOM neutralization capacitor characteristics: (a) capacitance and
quality factor, (b) shunt-equivalent loss resistance calculated from capacitance and quality
factor. Reprinted from [1].
2.5.2 Transformers
Transformers are implemented as vertical/broadside-coupled concentric spirals [65],
and a few design considerations are briefly mentioned here. First, a ground plane is in-
cluded in both transformer EMmodels and corresponding implementations to consistently
define common mode signal path, i.e. improve predictability. Close proximity to a ground
plane lowers inductor self-resonance frequency, so a  30m radial clearance is allowed.
41
EM models are generated up to 4–6 the center frequency of the amplifier so they
can be reasonably used in linearity simulations. To facilitate choice of transformer radius
based on target inductance, it helps to de-embed the effect of parasitic capacitance by
using an equivalent lumped circuit extraction. De-embedding the parasitic capacitance is
accomplished by fitting the broadband EM simulation data to a lumped 2-prototype like
that in [14] via numerical optimization. Maximum s-parameter fitting error is typically
1–5% (magnitude) and 3–5o (phase).
Minimizing losses, as well as deviation from  L;opt in loading of the PA stage have a
critical effect on Pout/PAE. The output matching balun XF in Fig. 2.12(a) is fabricated as
a separate test structure (without center taps) as shown by its micrograph in Fig. 2.14(a).
Measured and EM simulated self inductances and quality factors of the primary/secondary
windings are shown in Fig. 2.14(b), with <10% error in inductances. The measured and
simulated maximum available gain (MAG) of XF in differential mode are also overlaid in
Fig. 2.14(c); showing good estimation of its loss – 0.58dB simulated and 0.72dB measured
minimum insertion loss at 30GHz.
2.6 Experimental Results
The PA is fabricated in 1P7M 28nm CMOS LP. The die micrograph is shown in
Fig. 2.15, and core dimensions are 0:62  0:25mm2. A d.c. probe provides bias/supply
voltages to each of the two stages using separate pads for diagnostics. RF performance is
characterized using on-wafer probing at bias current densities fJPA; JDAg = f12; 22gA
=m (unless otherwise stated).
2.6.1 Measured Data
Measured small-signal s-parameters are shown in Fig. 2.16(a), with peak s21 of 15.7dB,
and a  3dB bandwidth of 3.85GHz (27.35–31.2GHz), centered around f0 = 29:25GHz.
Input return loss at f0 is better than 20dB and remains better than 10dB over 28–31.35GHz.
42
The PA is well-tuned to the target band overall.
CW signal Pin sweep results are measured across 27–31GHz with 1GHz step. Peak
CW signal performance is at 30GHz, and is shown in Fig. 2.16(b) with Gss =15.7dB,
Psat = 14dBm, P1dB =13dBm, PAEmax=35.5%, and PAE at Psat   9:6dB of 10%. Psat
is defined as Pout at 3dB compression. Key large-CW-signal power and PAE metrics
at saturation and back-off are plotted vs. frequency in Fig. 2.17(a). Psat is >13.5dBm
over 29–31GHz, and PAEmax is >32% over 28–31GHz. PAE at Psat   9:6dB is >8%
over 27–31GHz. Saturated metrics at 30GHz are plotted vs. VDD over 1.0–1.15V in
Fig. 2.17(b); nominal VDD for this 28nm process is 1.05V. A lower 1V supply is used
for reliability concerns, mainly due to hot carrier injection (HCI). Other potential nMOS
transistor degradation mechanisms such as time-dependent dielectric breakdown (TDDB)
are less of a concern in the implemented nMOS-only circuit [15, 71].
The setup in Fig. 2.18 is used to measure average Pout, EVM, and PAE for a 64-
QAM OFDM signal of 9.6dB PAPR, across an fc range of 28–30GHz, for BWsig =
f150; 250gMHz, i.e. f0:9; 1:5gGbps data rate. Careful manual tuning of I-Q channels
at each fc corrects for baseband digital swing imbalance/symbol clock errors (tuned in
M8190), and for RF I-Q amplitude/phase errors (tuned in E8267D). Using a thru element
(from impedance standard substrate) in place of the PA device under test (DUT), the mea-
sured EVM floor of the setup is f 38;  36gdBc for BWsig = f150; 250gMHz, i.e. has
11dB of margin from EVMreq across 28–30GHz.
Best measured output spectrum, adjacent channel power ratio (ACPR), and constella-
tion for 1V supply at BWsig of 250MHz and fc =30GHz are shown in Fig. 2.19. Average
Pout/PAE of 4.2dBm/9% are achieved at EVMreq =  25dBc. Fixing all parameters, and
lowering BWsig to 150MHz, average Pout/PAE increase to 5.2dBm/11%. Summaries of
measured 64-QAM OFDM performance vs. fc for 1V supply, and vs. VDD at 30GHz, are
shown in Fig. 2.20(a) and in Fig. 2.20(b), respectively, all at constant EVM ( 25dBc).
43
Average Pout is 1dB higher for BWsig of 150MHz than for 250MHz, independent of fc
(Fig. 2.20(a)), and of VDD (Fig. 2.20(b)).
Measured AM-AM/AM-PM characteristics at 30GHz are shown in Fig. 2.21 for three
example JPA values. Two key features of AM-AM/AM-PM data are P1dB, and maxi-
mum jAM  PMj relative to small-signal for Pout  P1dB; they are plotted vs. JPA in
Fig. 2.22(a) and (b), respectively. AM-AM is measured with  10dB/ 20dB input/output
directional couplers. Coupled ports feed two power sensors so Pin/Pout are concurrently
measured by a two-channel power meter. Simultaneously, relative insertion phase is mea-
sured using a network analyzer connected to the PA via coupler thru ports. AM-PM is
reported as network analyzer insertion phase reading vs. power meter reading. Instrument
noise affects the data due to weak coupled port outputs at low signal power. Smoothing
is used to reduce noise, and key data features are retained (somewhat conservatively for
P1dB) as is evident visibly in Fig. 2.21, and quantitatively in Fig. 2.22.
2.6.2 Comparison with State-of-the-art
Table 2.5 shows a comparison with state-of-the-art silicon mm-Wave PAs. We first
compare to 60GHz bulk CMOS PAs [15, 17] in terms of well-reported CW benchmarks.
Due to the lower 30GHz frequency, this work achieves better power gain per cascaded
stage, despite a significantly lower d.c. bias current density. Also, comparable CW Psat
per combined PA path is achieved, which is attributed to similar supply voltages and 1:1
transformer output matching per path.
We now turn to compare measured 64-QAM signal performance with [17]. This work
achieves significantly greater PAE for the sameBWsig=fc and EVM, and at slightly higher
average Pout per combined PA path. Note that [17] uses two-way combining and capaci-
tance linearization.
CW PAE at Psat 9:6dB is used as previously in [7] to fairly compare back-off PAE at
44
 25dBc EVM for 64-QAMOFDMwith publications that do not report this measurement.
The SiGe BiCMOS PA of [49] incurs the d.c. current of only its single stage, and the
SOI PA in [59] is a nonlinear class-E design with high PAEmax. This work achieves
comparable back-off PAE to them both. Also, despite lacking the additional bulk bias
control per transistor segment inherent to the FD-SOI technology used in [20], this work
achieves greater back-off PAE.
Overall, Table 2.5 shows that the implemented PAmeets or exceeds the state-of-the-art.
2.6.3 Comparison with 5G Requirements
Average Pout at 150/250MHzBWsig and EVMreq is 4.2/5.2dBm, supporting a range of
20–30m from Fig. 2.3(b). This short uplink range estimate is a result of the more practical
link budget constraints in Section 2.2 than in [45]. Also, at wider 250MHzBWsig, average
Pout is 2dB less than the 6.5dBm predicted in Section 2.3.3, assuming the DAminimally
impacts linearity.
Although computationally efficient, extracting AM-AM/AM-PM characteristics from
CW signal power sweep simulations to model PA nonlinearity in EVM estimates as in
Section 2.3 ignores relevant circuit memory effects, e.g. short term effects of limited
bandwidth about f0 [72], and long term effects of low-frequency (bias network) impedance
termination [73]. These memory effects can manifest experimentally as dependence of
Pout at fixed EVM on BWsig.
2.6.4 Discussion
To evaluate the AM-AM/AM-PM modeling method for EVM estimation, any mis-
match between measured/simulated amplifier characteristics should be de-embedded. Ac-
cordingly, the same code used for 64-QAM OFDM signal power sweeps through simu-
lated AM-AM/AM-PM models in Section 2.3 is applied to the measured AM-AM/AM-
PM data from Fig. 2.21 (after smoothing). EVM is plotted vs. average Pout from this
45
Table 2.5: Comparison with state-of-the-art silicon mm-Wave PAs. Reprinted from [1].
*Graphically estimated. **With pads. yFOM , Psat [dBm] + Gain [dB] + 10 log10 (PAEmax [%]) + 20 log10 (Freq: [GHz])
simulation in Fig. 2.23 for a JPA range encompassing 12A=m. Direct measurements
of EVM vs. average Pout using the setup of Fig. 2.18 are overlaid on the same axes for
BWsig = f150; 250gMHz at JPA = 12A=m. Intuitively, validity of an AM-AM/AM-
PM model extracted from CW power sweeps improves as the BWsig to amplifier band-
46
width ratio decreases. Therefore, Fig. 2.23 may be interpreted as follows: as BWsig falls
from 250MHz, the measured EVM vs. Pout curve shifts to the right and approaches its
most optimistic estimate from using measured CW AM-AM/AM-PM data in behavioral
simulation, i.e. BWsig ! 0.
Transient simulation of EVM at transistor-level can correctly capture memory ef-
fects [74], but it is prohibitively complex. Instead, simple two-tone simulations are used
to investigate effect of signal bandwidth on third-order distortion in the PA. IM3 for tone
spacing f = f20; 75; 125; 150; 250gMHz is shown in Fig. 2.24(a) and (b) for lower and
upper sidebands, respectively. IM3 degrades for wider tone spacing, confirming the trend
in Fig. 2.23. One exception is improving IM3 for 150–250MHz. In [75], it is shown that
second-order distortion causes a low-frequency ‘beat’ that modulates the power supply,
and [75] concludes that large off-chip bypass capacitors are needed to lower the supply
node impedances over the beat frequency range. Similarly, [73] shows that second-order
distortion causes a low-frequency beat that modulates input/gate bias. For VDD nodes in
our design, three bypass capacitors are used: one on-chip (20pF), one at the d.c. prob-
ing needle tip (120pF), and one on the PCB of the d.c. probe (10nF). In the gate bias
networks, the same bypass capacitor values are used, but large on-chip blocking resistors
were also included. Therefore, the EVM degradation with BWsig in Fig. 2.23, and the
similar initial IM3 degradation between 20–150MHz in Fig. 2.24 may be the result of
excessively high gate bias network impedance. The unexpected improvement in IM3 be-
tween 150–250MHz in Fig. 2.24 is potentially due to limitations of using a two-tone test.
For a two-tone input, the sub-harmonic beat is a single-tone that samples bias network
impedance at only one point, i.e. spectrum is two impulses at f . On the other hand,
for 64-QAM OFDM input as in Fig. 2.23, the signal modulating the bias nodes has a con-
tinuous spectrum extending over BWsig=2. Therefore, a more complicated multi-tone
signal simulation that modulates the bias nodes with tones spread across BWsig=2 may
47
provide greater resolving power, and therefore better consistency with 64-QAM OFDM
measurements at different BWsig than a two-tone test.
Another notable point from Fig. 2.23, is that simulations predict non-monotonic EVM
behavior over a Pout range that depends on bias point as seen for JPA = 10A=m. This
behavior is associated with inter-modulation nulls [8], but in the EVM context. Simula-
tions for the higher JPA values predict similar ‘cancellation’ at lower EVM levels than
the shown range in Fig. 2.23. On the other hand, only a minor reduction in the slope
EVM=Pout is reliably observed in the direct EVM vs. Pout measurements down to
3dB above the EVM floor of the setup in Fig. 2.18. Less constrained by measurement
floor, narrowband two-tone measurements at a tone spacing f = 20MHz, and the rela-
tively high JPA = 23:8A=m where AM-PM is minimal in Fig. 2.22(b), are shown in
Fig. 2.25 across a 27–31GHz center frequency range. The two-tone results exhibit reduc-
tion in slope vs. average Pout like direct EVMmeasurements in Fig. 2.23, but no IM3 nulls
down to 10–12dB lower distortion levels over the same average Pout range. Therefore,
the measured linearity performance of the implemented PA is not likely to be a result of
sensitive inter-modulation distortion null effects.
2.7 Conclusion
This paper showed that spectrum around 28GHz is a viable choice of carrier band for
low-power, broadband 5G wireless UEs. Output power requirements that consider realis-
tic size and RFIC- and integrated-antenna-module-related losses were derived for the most
challenging anticipated 5G uplink use scenario. Subsequently, a PA output stage optimiza-
tion methodology that tackles those demanding requirements was proposed. Introduced as
a perturbation, a small source degeneration inductor enabled embedding of the optimized
output stage into a two-stage PA, by broadening inter-stage bandwidth and helping to re-
duce distortion. Accordingly, a 28GHz band PA was fabricated in 28nm bulk CMOS
48
and validated the presented concepts at state-of-the-art performance. For on-going 5G
phased array PA developments, more broadband amplifier techniques, and efficient mod-
eling of circuit memory effects in EVM estimation, e.g. [74], may help increase uplink
transmission range, while aiming to maintain the demonstrated high power and spectral
efficiencies.
49
(a)
(b)
(c)
Figure 2.14: Output balun characterization: (a) test structure micrograph, (b) inductances
and quality factors (c) differential mode maximum available gain (i.e. transformer effi-
ciency). Reprinted from [1].
50
Figure 2.15: Die micrograph of fabricated two-stage PA. Reprinted from [1].
51
Frequency (GHz)
20 25 30 35 40
s
-p
a
ra
m
e
te
rs
 (
d
B
)
-60
-50
-40
-30
-20
-10
0
10
20
s
11
s
21
s
12
s
22
(a)
(b)
Figure 2.16: Small- and large-signal CW signal measurement results for JPA = 12A
=m, JDA = 22A=m and 1V supply: (a) s-parameter results (b) best measured CW
signal input power sweep at fc =30GHz. Reprinted from [1].
52
(a)
(b)
Figure 2.17: Swept large-CW-signal measurement results summary for JPA = 12A=m,
JDA = 22A=m: (a) key performance metrics over 27–31GHz for 1V supply, and (b)
saturated performance metrics vs. supply voltage at 30GHz. Reprinted from [1].
53
Figure 2.18: EVM measurement setup for 64-QAM OFDM signal. Reprinted from [1].
Figure 2.19: Peak 64-QAM OFDM measured performance: output spectrum, ACPR, and
constellation for JPA = 12A=m, JDA = 22A=m and 1V supply at Pout = +4:2dBm,
BWsig = 250MHz (1.5Gbps), achieving 9% PAE at EVM=  25dBc. Reprinted from [1].
54
(a)
(b)
Figure 2.20: Swept 64-QAM OFDM signal measurement results summary for JPA =
12A=m, JDA = 22A=m: (a) average Pout and corresponding PAE vs. fc for 1V
supply, (b) average Pout and corresponding PAE supply voltage at fc =30GHz. Reprinted
from [1].
55
(a)
(b) (c)
Figure 2.21: Measured AM-AM/AM-PM characteristics of two-stage PA at JDA = 22A
=m and corresponding Savitzky-Golay smoothed characteristics for three example JPA
values: (a) JPA = 10:0A=m, (b) JPA = 12:9A=m, and (c) JPA = 16:5A=m.
Reprinted from [1].
56
(a)
(b)
Figure 2.22: Key metrics of measured AM-AM/AM-PM characteristics at 30GHz for 1V
supply and JDA = 22A=m before and after Savitzky-Golay smoothing: (a) P1dB, and
(b) maximum AM-PM deviation w.r.t. small-signal for Pout  P1dB. Reprinted from [1].
57
Figure 2.23: EVM vs. average 64-QAM OFDM Pout at 30GHz obtained using di-
rect measurement (setup in Fig. 2.18) for BWsig = f150; 250gMHz, and using behav-
ioral simulation with measured AM-AM/AM-PM characteristics of the two-stage PA (i.e.
BWsig ! 0) for different JPA at JDA = 22A=m; some example AM-AM/AM-PM
characteristics are shown in Fig. 2.21. Reprinted from [1].
58
(a)
(b)
Figure 2.24: Simulated two-tone inter-modulation distortion at JDA = 22A=m, JPA =
12A=m for f = f20; 75; 125; 150250gMHz at the amplifier center frequency: (a)
lower IM3, (b) upper IM3. Reprinted from [1].
59
(a) (b)
(c) (d)
Figure 2.25: Measured two-tone inter-modulation distortion at JDA = 21A=m, JPA =
23:8A=m for f = 20MHz across 27–31GHz center frequency: (a) lower IM3, (b)
upper IM3, (c) lower IM5, and (d) upper IM5. Reprinted from [1].
60
3. A WIDEBAND LINEAR 28-GHZ POWER AMPLIFIER FOR
POWER-EFFICIENT 5G PHASED ARRAYS IN 40-NM CMOS*
3.1 Introduction
To meet rising demand, broadband cellular data providers are racing to deploy fifth
generation (5G) mm-Wave, e.g. rollout of some 28GHz-band services is intended in 2017
in the USA, with 5/1Gbps downlink/uplink targets. Even with 64QAM signaling, this
translates to RF bandwidth (RFBW) as large as 800MHz. With 100m cells and a dense
network of 5G access points (AP), potential manufacturing volumes make low-cost CMOS
technology attractive for both user equipment (UE) and AP devices. However, poor Pout
and linearity of CMOS power amplifiers (PA) are a bottleneck, as 10dB back-off is typical
to meet error vector magnitude (EVM) specifications. This limits range and power added
efficiency (PAE), and wider RFBW accentuates these issues. On the other hand, sufficient
element counts in the envisaged 5G phased array modules can overcome path loss despite
low Pout per PA, e.g, by combining RFICs in AP. CMOS PAs with wideband linearity/PAE
can therefore enable economical UE/AP devices to deliver 5G data rates.
Silicon 28GHz-band PAs with state-of-the-art PAE were recently reported [1, 49, 76].
Despite these advances, linearity is not sufficiently broadband for 5G speeds i.e. maxi-
mum RFBW of 250MHz at 28GHz [1]. Relevant state-of-the-art CMOS PAs for 802.11ad
[15, 17, 20] are similar to their 28GHz counterparts in a normalized RFBW sense, i.e.
500MHz RFBW at 60GHz [17]. This paper reports a 28GHz CMOS PA supporting
* Section 3 is reprinted with permission from S. Shakib, M. Elkholy, J. Dunworth, V. Aparin and K. 
Entesari, "2.7 A wideband 28GHz power amplifier supporting 8100MHz carrier aggregation for 5G in 
40nm CMOS," 2017 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2017, 
pp. 44-45. c 2017 IEEE.
61
RFBW 3Œthe state-of-the-art without degrading Pout, PAE, or EVM. The three-stage PA
uses dual-resonance transformer matching networks with bandwidths optimized for wide-
band linearity. Digital gain control (9dB range) is integrated for phased array operation; a
needed functionality absent from existing high-performance mm-Wave PAs.
3.2 Circuit Design
Fig. 3.1 shows the PA schematic. The 1x power transistor layout is similar to [15]
(W/L=32Œ1m/40nm). Capacitive neutralization is used in stages 2 and 3 for reverse iso-
lation. Stage 1 is a current steering VGA with 4Œ2dB gain steps determined by width
ratios of a switched array of low-Vt cascode transistors. This topology has robust gain step
accuracy, and small input/output impedance and insertion phase variations across digital
states. An additional 1dB step is implemented in the biasing of stage 2. The stage scaling
indicated in Fig. 3.1 helps to avoid compression in stages 1 and 2.
Back-off PAE of stage 3 is first optimized using a similar approach to [1]. Wide-
band matching is additionally desired to improve linearity by avoiding memory effects
e.g. sharp RF gain slope. Broadband transformer matching networks are realized using
loose magnetic coupling k to attain two in-band resonances separated by f. Using ideal
transformer models, and by simulating amplitude-to-amplitude/amplitude-to-phase mod-
ulation conversion (AM-AM /AM-PM) of the PA at 28GHz, Fig. 3.2 illustrates the effect
of fin on linearity/PAE in stage 3 for constant bias and terminations. Explicit input shunt
resistance is used, ranging from 660 to 100 to increase fin from 1 to 7GHz at the cost of
power gain, which drops from 13 to 6dB. Fig. 3.2 shows AM-AM is insensitive, while
AM-PM decreases with increasing fin. Pout at constant EVM (for 64QAM OFDM) in-
creases with fin, and approaches the artificial case of setting AM-PM to zero. fin=3GHz is
chosen as a compromise between Pout and PAE/gain. To realize a desired fin, transformer
windings are offset to control k. The low Ropt,diff=45 target from load-pull simulation
62
enables broadband output matching (fout 7GHz). Shunt input resistance and transformer
self-inductances scale inversely to Cin of each stage such that gain is 7-8dB/stage with
overall bandwidth limited by Cin of stage 3.
3.3 Experimental Results
The PA die micrograph is shown in Fig. 3.3; fabricated in 1P6M 40nm CMOS LP
with core dimensions of 0.90Œ0.25mm2, and using 1.1V nominal supply. S-parameter
measurements across 20-40GHz and gain settings are shown in Fig. 3.4. Input return
loss is >10dB over 24.3-36.6GHz and varies negligibly across settings. Peak gain is
22.4dB/13.3dB for maximum/minimum setting at 28GHz. Expected skin effect in trans-
formers and transistor MAG roll-off, and unexpectedly small capacitance (w.r.t. simula-
tion) cause the observed gain slope. Also, Fig. 3.4 shows peak nonlinearity error <0.5dB in
gain step over 26-34.3GHz. Phase error is small (peak<9.3o), which mitigates complexity
of phased array calibration.
Measured continuous wave (CW) Pin sweeps up to Pin,max=-3.5dBm are reported in
Fig. 3.5 for highest gain setting and over 26-33GHz with 1GHz step (Pmax in Fig. 3.5 is
Pout at Pin,max). The PA is driven to at least 1dB compression across 26-33GHz, and to 2-
3dB compression only over 27-30GHz. Peak performance is at 27GHz, with Psat/PAEmax
of 15.1dBm/33.7%, where Psat is Pout at 3dB compression. Also, P1dB/PAE1dB remain
>13.4dBm/25%, while PAE at P1dB-5dB remains >13.2% across 26-33GHz.
Fig. 3.7 shows measurements using a 64QAM OFDM signal (2048-point FFT, 75kHz
tone spacing, 9.7dB PAPR at 0.01% CCDF). To test with 5G data rates, 1, 4, and 8,
component carrier (CC) aggregation scenarios are measured, for 90MHz-wide CCs and
10MHz guard bands. The test setup and its characterization are shown in Fig. 3.6. CCs
are amplified concurrently with composite Pin divided evenly among them. PAE/EVM
are plotted vs. Pout at 27GHz for 1,4,8CC. Pout/PAE for -25dBc or better EVM on each
63
CC are also summarized vs. center frequency. For 8CC, peak performance is at 27GHz:
Pout= 6.7dBm at 11% PAE; a snapshot of corresponding measured output spectrum shows
lower/upper adjacent channel leakage ratios (ACLR) are -34.4/-29.4dBc. Pout/PAE re-
main > 6.5dBm/9.6% across 27-32GHz for 8CC.
Table 3.1 shows a comparison with the state-of-the-art. This work extends RFBW by
3Œ over that in [1] while achieving higher Pout/PAE at equal EVM for the same signal
PAPR. Narrower RFBW and lower signal PAPR tested in [76] make comparison of lin-
earity difficult. Relative to [17], this PA produces almost the same Pout at the same EVM
for wider RFBW relative to center frequency and at 2 higher PAE from a lower supply
voltage. Normalizing to supply voltage and number of combined PA cores shows CW
Psat of this work is on-par with the state-of-the-art. Back-off CW PAE of this work only
seems lower than the single-/two-stage designs of [76]/ [1], but this is a natural result of
the 12dB/6dB higher gain achieved. For example, CW drain efficiency of stage 3 in this
work at P1dB-5dB is 25.6%, i.e. very close to 26.3% [76] for 1.1V supply. This work
simultaneously achieves higher back-off PAE and 7dB higher gain than [49]. In summary,
the implemented wideband CMOS PA can handle challenging 5G data rates at low-cost
without sacrificing range or efficiency.
64
Figure 3.1: Three-stage PA: cascode VGA 1st stage (42dB digital gain steps), and
capacitively-neutralized common-source 2nd and 3rd stages; power transistor size scal-
ing indicated in units. Reprinted from [2].
65
Figure 3.2: Optimization of linearity and PAE in stage 3 using spacing fin of the two
resonance frequencies of inter-stage matching network (input of stage 3). Reprinted from
[2].
66
Figure 3.3: Die microgrph of 40 nm CMOS PA. Reprinted from [2].
Figure 3.4: Measured s-parameters across digital gain states as well as the associated
gain/phase errors vs. frequency. Reprinted from [2].
67
Figure 3.5: Measured large CW signal power sweep results over 27–30GHz (Pin;max =
 3.5dBm at all frequencies); CW Pout/PAE summaries at key power levels over 26–
33GHz. Reprinted from [2].
68
EVM Measurement Setup
-10dB
G
G
S
G
G
S
DU
T Used if Analysis 
 BW>512MHz
R&S SMW200A
Vector Signal Generator
-20dB
R&S FSW43
Signal & Spectrum Analyzer
R&S RTO2044
Digital Oscilloscope
R&S NRP50SN
Power Sensor
R&S NRP50SN
Power SensorPC
LAN Connection
(a)
No DUT No DUT No DUT
No DUT
(b)
Figure 3.6: EVM measurement setup. (a) Block diagram. (b) Characterization of EVM
floor over center frequency for the tested carrier aggregation waveforms. Worst-case EVM
floor data measured by connecting SMW200A directly to FSW43 using only a cable
(2-2.5dB loss at 30GHz); i.e. without CMOS DUT. At each center frequency, Pout of
SMW200A is increased until EVM is no longer noise-limited; then highest/worst EVM
floor across component carriers is recorded. Reprinted from [2].
69
Figure 3.7: EVM/PAE vs. Pout at 27GHz for 64-QAM OFDM with 1,4,8CC and
Pout/PAE summaries vs. center frequency for  25dBc EVM; measured spectrum/ACLR
for peak 8CC performance at 27GHz: 4.32Gbps, +6.7dBm, 11% PAE. Reprinted from [2].
70
Center Frequency (GHz)
26 27 28 29 30 31 32 33
P o
ut
@
 E
VM
 (d
B
m
)
10
10.5
11
11.5
QPSK,OFDM EVM< -16dBc on Each CC
1CC
4CC
8CC
(a)
Center Frequency (GHz)
26 27 28 29 30 31 32 33
PA
E 
@
 E
VM
 (%
)
16
17
18
19
20
21
QPSK,OFDM EVM< -16dBc on Each CC
1CC
4CC
8CC
(b)
Figure 3.8: Summary of QPSK OFDM carrier aggregation measurements versus carrier
frequency for  16 dBc EVM on each CC. (a) Average Pout. (b) Average PAE. Reprinted
from [2].
71
Center Frequency (GHz)
26 27 28 29 30 31 32 33
P o
ut
@
 E
VM
 (d
B
m
)
5.5
6
6.5
7
7.5
QPSK,OFDM EVM< -25dBc on Each CC
1CC
4CC
8CC
(a)
Center Frequency (GHz)
26 27 28 29 30 31 32 33
PA
E 
@
 E
VM
 (%
)
8
9
10
11
12
QPSK,OFDM EVM< -25dBc on Each CC
1CC
4CC
8CC
(b)
Figure 3.9: Summary of QPSK OFDM carrier aggregation measurements versus carrier
frequency for  25 dBc EVM on each CC. (a) Average Pout. (b) Average PAE. Reprinted
from [2].
72
Carrier Frequency (GHz)
26 27 28 29 30 31 32 33
P o
ut
@
 E
VM
 (d
B
m
)
5.5
5.75
6
6.25
6.5
6.75
7
64QAM OFDM, -25dBc EVM, 1CCxRFBW
100MHz 300MHz 500MHz 700MHz
(a)
Carrier Frequency (GHz)
26 27 28 29 30 31 32 33
PA
E 
@
 E
VM
 (%
)
8.5
9
9.5
10
10.5
11
11.5
64QAM OFDM, -25dBc EVM, 1CCxRFBW
100MHz 300MHz
500MHz 700MHz
(b)
Figure 3.10: Summary of 64-QAM OFDM measurements versus carrier frequency for a
single CC having different contiguous RFBW values at  25 dBc EVM. (a) Summary of
average Pout. (b) Summary of PAE. Reprinted from [2].
73
Ta
bl
e
3.
1:
C
om
pa
ri
so
n
w
ith
st
at
e-
of
-t
he
-a
rt
lin
ea
rm
m
-w
av
e
si
lic
on
PA
s
fo
rd
at
a
co
m
m
un
ic
at
io
n.
R
ep
ri
nt
ed
fr
om
[2
].
*G
ra
ph
ic
al
ly
es
tim
at
ed
.*
*W
ith
pa
ds
.
74
4. A 28-GHZ TRANSMIT-RECEIVE FRONT-END MODULE FOR 5G
HANDSET PHASED ARRAYS IN 40-NM CMOS
4.1 Introduction
Accelerated development of millimeter wave (mm-Wave) systems for fifth-generation
(5G) mobile is overtaking formal standardization and increasingly focusing on the 28 GHz
band. Integrating phased arrays in the hand-held user equipment (UE) and the access
points (AP) is widely regarded as the solution for path losses at mm-Wave frequencies [43].
However, supporting the requirements of even initial 5G pilot services; such as spectrally
efficient waveforms like 64-QAM OFDM, and radio frequency bandwidths (RFBW) as
broad as 800 MHz [77], means UE phased array architecture and implementation tech-
nology should chosen carefully.
The RF phase shifting (RFPS) architecture is most suited to battery-powered UE de-
vices due to its low power consumption compared to local oscillator phase shifting (LOPS)
and baseband phase shifting (BBPS) [78]. RFPS also relaxes receiver linearity require-
ments due to the spatial rejection of interferers it offers; since beamforming occurs before
the RF mixer [38, 79]. However, phase-shifting resolution needs to be carefully chosen
based on application requirements, as both insertion loss and physical size of RF phase
shifter (PS) circuits typically grow with resolution [80, 81].
Starting almost a decade ago, Ka-band RFPS array developments in silicon focused
on relatively narrowband satellite communication or radar receiver (Rx) applications [82–
84], while typically assuming III-V HEMT LNAs drive the silicon Rx array chip inputs.
The majority of those works used SiGe BiCMOS, and only a limited number tackled
the challenges of integrating the transmitter (Tx) and the transmit-receive (TR) antenna
switch [85–87]. More recently, while the paradigm of using silicon along with III-V com-
75
pounds is still necessary for very high performance, e.g. see [79], impressive advances
have been achieved by all-silicon arrays. For example, a 400 MHz 16-QAM link has
been demonstrated over a 300 m range using 32-element Tx/Rx arrays, each built from
eight, un-calibrated 22 SiGe chips [88]. Similarly, base station radio developments in
SiGe have recently demonstrated multiple-beam capabilities with excellent precision in
gain/phase control [89], and new highly-compact RFPS-based architectures [90].
Despite the inherent advantage of SiGe as a material, integration with high-performance
digital circuits and low cost in mass production make CMOS more attractive than SiGe for
UE devices. Recent CMOS RFPS array developments targeted 60 GHz 802.11ad, and
demonstrated a high level of integration, e.g. [4, 38]. However, 5G cellular has an inher-
ently more demanding link budget than 802.11ad, e.g. due to the required range. Also,
physical size of hand-held UE devices is more constraining at 28 GHz than at 60 GHz and
limits the number of antenna elements to 4–8 [1]. On the other hand, an access point (AP)
form factor permits up to100’s of elements, and AP manufacturing volume for the antic-
ipated high-density 5G cell network may be larger than its counterpart for sub-6 GHz. A
scalable design may enable an APmodule to be built by bonding multiple RFICs to a single
larger AP antenna array, e.g. like in [88, 89]. The UE CMOS RFIC design must however
satisfy some of the even more stringent requirements of the AP, e.g. low-noise amplifier
(LNA) noise figure (NF), and power amplifier (PA) average Pout and power added effi-
ciency (PAE) at a desired error vector magnitude (EVM). Therefore, proving adequate NF
and Pout/PAE in CMOS may enable it to compete for both UE and AP RFICs.
This Chapter presents the fist high-performance TR FEM in bulk CMOS targeting 5G
UE phased arrays, and is organized as follows. Section 4.2 discusses TR front-end module
(FEM) considerations, including derivation of the circuit-level requirements for the PA
[Chapter 3], LNA, and PS. Then, Section 4.4 provides a comparison of the strategies
considered for integrating the circuit blocks at antenna and PS interfaces, followed by a
76
Table 4.1: Key specifications for circuit components of UE FEM and the corresponding
measured performances achieved by stand-alone test circuits in this work.
Block-level
Specification
Circuit
Block
RequiredBlock-level
Performance
AchievedBlock-level 
Performance Comment
Gain [dB] PA , LNA
PA: 25
LNA: 25
PA: 22.4
LNA: 27.1
Gain Control PA , LNA
8× 1dB
(pk.error ≤0.5dB) PA: 9× 1dBLNA: 8× 1dB
Waveform
RFBW
EVM [dBc]
PA
64-QAM OFDM
up to 8CC× 100MHz− 25 64-QAM OFDMup to 8CC× 100MHz− 25 Most challenging/highest throughput 5G scenario in e.g. [2];≈½ of total EVM budget given to PA
Avg. Pout[dBm] PA 7 Peak performance: 6.7≥ 6 @ 27—31GHz
Avg. PAE @ Tx 
Pout[%] PA
Maximum achievable 
in technology
Peak performance: 11≥ 8.8@ 27—31GHz For UE battery life and thermal dissipation @ cell-edge scenario
NF [dB] LNA 3.7 3.3 @ max gain From link budget analysis in [18]; assuming UE Rx achieves similar LNA NF as AP Rx
IIP3[dBm] LNA − 6.4 @ min gain Across 26—33GHz:≥− 12.6@ max gain Sin,max~− 25dBmfor 150—300mtypical link range, seee.g. [14]
PS Insertion 
Loss [dB] PS
≤ 6 for Rx NF≤ 8 for Tx EVM 5.9 Max tolerable loss limited by Rx NFand Tx EVM [Figs. 2 and 3]
PS Resolution 
[bits] PS ≥ 3 3 Spec from [Fig. 1(a)] but limited to 3-bit toal to meet ~6dB IL;IL @Ka-band is 2.5dB/bit in[6], reduced to 2.0dB/bit in this work [7]
Per-element 
Random
Phase Error 
[degr.m.s.]
PS ≤ 10 5 10o < LSB/4 for 3-bit PS and for beam-pointing accuracy (conservative, [Fig. 1(b)])
Required linearity more limiting for PA gain 
(load-line impedances lower in PA stages)
For e.g. array tapering or channel-to-channel 
gain mismatch correction.
Spec. based on 8-element UE array and link 
budget analysis in [18]
detailed step-by-step explanation of the trade-offs in the presented design. The Tx and Rx-
mode experimental results of the complete FEM are provided in Section 4.5, and finally
the paper is concluded in Section 2.7.
4.2 Transmit-receive Module Considerations
This section explains the key specifications in Table 4.1 for FEM component circuits,
while leveraging the uplink RF system budget investigation in [1]. Values of parameters
from the analysis of [1] to be used in this paper are given, and the analysis is augmented
where needed.
77
4.2.1 Link Budget
4.2.1.1 5G Uplink
The UE FEM is in Tx-mode. Anticipating a 5G-standardized single-channel RFBW
of 100 MHz, the required Pout is 7 dBm for 64-quadrature amplitude modulation (64-
QAM) at  25 dBc EVM and 40 m link range (higher throughput scenario), or 6.5 dBm
for quadrature phase shift keying (QPSK) at  14 dBc EVM and 150 m link range (cell-
edge scenario) [1]. These Pout requirements include margin for UE-side front-end losses
of LFE;Tx =3.9 dB; 1.4 dB for TR switch, 0.6 dB for chip-to-package transition combined
with via to antenna layer of the circuit board, and 1.9 dB for antenna feed-line. For longer
UE battery life, the Tx back-off PAE should be the maximum permitted by the technology.
4.2.1.2 5G Downlink
The UE FEM is in Rx-mode. In the uplink system budget analysis of [1], a AP-side Rx-
element noise figureNFRx of 1+0:5
p
fGHz =3.7 dB at 30 GHz was assumed. Also, high
LNA gain was assumed so the noise figure of the stand-alone LNA NFLNA dominated
NFRx. Here, the same 3.7 dB value is targeted for the UE-side LNA. Considering a
1.4 dB TR switch loss in Rx-mode, this translates to a FEM noise figureNFFEM =5.1 dB.
Additionally, for a maximum received signal power of Sin;max =  25 dBm, the effective
input-referred 1 dB compression point of the FEM IP 1 dB;FEM needs to be   15 dBm.
That is, for Rx nonlinearity to result in a negligible EVM degradation, IP 1 dB;FEM should
be at least Sin;max+ signal PAPR,where this PAPR10 dB. Therefore, IIP3 =  6:4 dBm
for the LNA if the 1.4 dB loss of the switch mentioned above is accounted for, and IIP3 
IP 1 dB + 10 dB holds [10].
78
4.2.2 Beam Steering
Accommodating up to 8 carrier aggregation, i.e. supporting even the downlink data
rate targets as in [2], the fractional signal bandwidth anticipated for 5G standardization in
the 28 GHz band remains <3%. Thus, a PS with a linear phase profile over only a limited
bandwidth suffices to avoid significant distortion from array-induced intersymbol interfer-
ence (ISI) [78]. That is, a broadband true-time delay (TTD) element is not necessary.
For the PS, digital control is desirable for robustness of the beam steering, but the
associated quantization error results in beam misalignment that degrades the array gain.
The theoretical peak EVM degradation from PS quantization was reported for a uniform
linear array (ULA) [78]:
max
inc

EVM
EVM0

= N 
24 sin

2
42NPS

sin

N  2
42NPS

35 ; (4.1)
where EVM0 is the EVM for continuous phase tuning (i.e. NPS ! 1), and the maxi-
mization is performed over the signal’s spatial angle of incidence inc relative to the Rx
ULA’s broadside direction. Figure 4.1(a) is a plot of maxinc fEVM=EVM0g from (4.1)
versus PS resolution NPS in bits for different N . The EVM degradation increases with
N because the beamwidth progressively gets narrower; thereby increasing the impact of
a given misalignment error. Beam misalignment is allowed to degrade EVM by a margin
of up to 3 dB in [1]. Also, from Fig. 4.1(a), NPS  3 yields a peak EVM degradation of
<4 dB in a ULA with N = 8. The adopted N = 4  2 URA case (not plotted; (4.1) only
applies to ULAs) is expected to lie between the shown N = 4 and the N = 8 ULA cases.
Therefore, for NPS  3, EVM degradation is within the budget in [1].
Additionally, random phase errors/mismatches with a standard deviation PS in each
array element super-impose nonlinearity errors on the quantized phase steps. These sub-
79
sequently map to nonlinearity errors in the quantized steps of the beam pointing angle,
with a statistical variance 2beam which degrades with wider scan angles inc, but improves
with number of elements N [78]:
2beam = 12 
2PS
2 cos2 (inc) N  (N2   1) : (4.2)
Figure 4.1(b) is a plot of beam versus PS for broadside incidence using (4.2) (i.e. ‘nor-
malized’ by taking inc = 0o). Random phase error PS should be controlled so that its
associated beam-pointing angle nonlinearity error remains  1
2
 the smallest possible
beam-pointing angle step size dictated by PS resolution:  sin 1

(2=2NPS)


 14:5o,
even at the maximum scan angle inc;max. Figure 4.1(b) shows that even for N = 4,
and assuming a very large inc;max = 75o, PS  10o results in beam  5:5o, which is
 1
2
 14:5o = 7:25o. Considering that other limitations, such as nulls of element factor
(e.g. patch antenna element), will certainly dominate the radiation pattern well before this
extremely wide inc;max = 75o is achieved, PS of 10o r.m.s. seems highly conservative as
far as accuracy of UE array beam pointing angle is concerned.
Due to battery life limitations in UE devices, a passive PS is desirable. A passive PS
is also leveraged here as it enables a bidirectional implementation, which helps to reduce
silicon area. The key obstacle is insertion loss (IL), i.e. in terms of overall front-end
power gain, linearity, and noise. The impact of PS IL is explained for the two modes of
operation next.
4.2.2.1 Transmit Mode
Besides lowering total gain, increasing PS IL forces the RF pre-driver of the Tx dis-
tribution network to generate an equally higher Pout, given the per-element PA power gain
is limited by achievable gain per stage and available silicon area. Figure 4.2(a) shows this
scenario forN = 8 and 25 dB total power gain for the three-stage Tx-element PA. In addi-
80
PS Digital Phase Resolution (bits)
3 4 5 6 7
M
ax
. E
VM
 D
eg
. (
dB
)
-1
0
1
2
3
4
5
Worst-case Impact of Phase Quantization
N=2
N=4
N=8
N=16
N=32
(a)
Per-Element Phase Error (deg
r.m.s.
)
0 2.5 5 7.5 10 12.5 15 17.5 20 22.5 25Po
in
tin
g 
Er
ro
r (
de
g r.
m
.s
.)
0
1
2
3
4
5
Random Beam Pointing Error
N=2
N=4
N=8
N=16
N=32
(b)
Figure 4.1: Impact of quantization/random phase step errors on array performance. (a)
Worst-case EVM degradation as a result of phase quantization error in digital PS. (b)
Impact of random per-element phase errors on beam pointing angle accuracy.
tion to its inherent 3 dB splitting ‘loss’, 1 dB IL is budgeted per Wilkinson splitter stage.
For simplicity, the pre-driver re-uses the same circuit design as the output stage of the PA,
with 8 dB gain. Therefore, the two stages highlighted in yellow in Fig. 4.2(a) are modeled
to have the same amplitude-to-amplitude/ amplitude-to-phase conversion (AM-AM/ AM-
PM) behavior. All other blocks are modeled as perfectly linear. Figure 4.2(b) shows the
simulated Tx EVM versus IL of the passive PS. Targeting  28 dBc, and to limit EVM
degradation to <0.5–1 dB, this IL cannot exceed 8 dB.
4.2.2.2 Receive Mode
A higher PS IL here reduces the total front-end gain; hence the noise of the chain
beyond the PS is increasingly larger when referred to the Rx input and the overall NFRx
degrades. Figure 4.2(a) illustrates this scenario, with an LNA having a total gain of 25 dB,
an IIP3 of  6 dBm, and a noise figure NF =5 dB. The back-end (after Wilkinson com-
bining network) is assumed to have a noise figure of 16 dB and an IIP3 of 10 dBm.
Figure 4.2(b) plots both NFRx and IIP3 versus PS IL. Overall Rx IIP3 is seen to im-
81
Δφ
Δφ
Δφ
Δφ
Δφ
Δφ
Δφ
Δφ
8-Element Array
Pre-driver Stage
Distribution NW
Tx-Element
3-Stage PA
Digitally-controlled
Passive Phase Shifter
3dB Wilkinson
Splitter=
1
4
5
8
(a) (b)
Figure 4.2: Effect of passive phase shifter insertion loss on overall transmitter EVM. (a)
Simulation scenario illustration. (b) 8-element phased array transmitter EVM versus phase
shifter insertion loss.
prove due to the reduction in total gain of the LNA-PS composite. To limit overall NFRx
degradation to 0.5 dB above the 5 dB of the LNA, the IL cannot exceed 6 dB.
Δφ
Δφ
Δφ
Δφ
Δφ
Δφ
Δφ
Δφ
LNA Stage
Driving RF Mixer
Combining NW
Rx-Element
3-Stage LNA
Digitally-controlled
Passive Phase Shifter
3dB Wilkinson
Combiner=
5
8
1
4
(a)
Insertion Loss of RF Phase Shifter (dB)
0 1 2 3 4 5 6 7 8 9 101112131415
N
F R
x
(d
B
)
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
9
Rx-Mode Performance vs. Passive PS Loss
IIP
3 R
x
(d
B
m
)
-10
-9
-8
-7
-6
-5
NFRx @ LNA Max Gain
IIP3Rx @ LNA Min Gain
(b)
Figure 4.3: Effect of passive phase shifter insertion loss on overall receiver noise figure
and linearity. (a) Simulation scenario illustration. (b) 8-element phased array receiverNF
and IIP3 versus phase shifter insertion loss.
82
4.3 Circuit Blocks
To experimentally validate the presented strategy for integrating FEM components,
the PA, LNA, and PS are fabricated as stand-alone test circuits; their die photos are shown
in Figs. 4.4(a)–(c), respectively. Stand-alone measured performances are summarized in
Table 4.1 as a reference for before-versus-after comparison with measurements reported
in Section 4.5 after integration. For detailed design considerations/characterizations of the
stand-alone blocks, see [2, 81, 91].
900μm
25
0μm
St1 St2 St3
(a)
G
G
S
790μm
25
0μm
G
G
S
St1 St2 St3
(b)
G
G
S
G
G
S
I/O Balun I/O Balun90deg
180deg
45deg
25
0μm
825μm
140μm 140μm
(c)
Figure 4.4: Die micrographs of stand-alone front-end module component test circuits. (a)
Power amplifier. (b) Low-noise amplifier. (c) Phase shifter 3.
83
4.3.1 Power Amplifier
Figure 4.5 shows the schematic, and Fig. 4.4(a) shows the die micrograph of the three-
stage PA of [2]. It consists of a current-steering 42 dB variable gain amplifier (VGA)
input stage [stage 1, Fig. 4.5(b)], followed by two capacitively neutralized differential
stages [stages 2 and 3, Fig. 4.5(c)]. A 1 dB least significant bit (LSB) is implemented in
the bias mirror of stage 2.
The current steering VGA topology of stage 1 is chosen for the relative insensitivity
to digital gain setting of its input impedance, output impedance, and insertion phase. The
linear-in-dB steps of the stage therefore can have correspondingly small gain step nonlin-
earity errors over a wide frequency range. This is achieved by drawing each transistor in
the switched array of low-Vt cascode devices as an integer multiple of 1 m/40 nm fingers.
Layout details are in [2].
Size and bias point of stage 3 are chosen based on an optimization methodology similar
to [1], beginning from aW=L=321 m/40 nm unit power cell similar to the layout in [1,
15]. Overcoming the linearity limitation of the design in [1] for wide signal bandwidths
250 MHz is a main driver for the wideband PA in [2]; i.e. for the integrated front-
end to subsequently maintain the wideband transmit linearity achieved. Thus, bandwidth
of interstage matching between stages 2 and 3 was optimally selected by controlling the
spacing between the network’s two in-band resonances as described in [2]. Finally, driving
transistors in each stage are scaled as indicated in Fig. 4.5(a), and remaining interstage and
input matching transformer inductances are inversely scaled with their respective driven
stage sizing as given in detail in [2].
4.3.2 Low-noise Amplifier
Figure 4.6 shows the schematic, and Fig. 4.4(b) shows the die micrograph of the three-
stage LNA of [33]. It consists of a single-ended cascode input stage [stage 1, Fig. 4.6(a)],
84
(a)
(b)
(c)
Figure 4.5: Schematic of three-stage power amplifier. (a) Top level block diagram and
relative stage scaling. (b) Stage 1: 42 dB cascode VGA. (c) Stages 2 and 3: common
source stages with capacitive neutralization.
followed by two variable-gain, capacitively neutralized differential stages [stages 2 and 3,
Figs. 4.6(b) and (c)].
Stage 1 uses a single-ended rather than a differential design to minimizeNF given the
limited d.c. power budget. Inductive source degeneration is used for 50 
 input match-
85
ing [10], and the source and gate inductors of the input network are magnetically coupled
to increase the effective inductance without adding series resistance and thereby reduce
their direct contribution to NF . Cascode device M2 improves reverse isolation, and a
series inductor between the drain of M1 and source of M2 boosts gain and reduces the
noise contribution ofM2 by countering capacitive parasitics at the ‘cascode node’ [92]. A
transformer balun converts the output of stage 1 to a differential signal at the gates ofM3
andM4, and is designed with loose magnetic coupling k 0.27 for wideband impedance
matching. Similarly, loose coupling is also chosen for transformers in the other inter-
stage/output matching networks.
Doubling available voltage swing using differential designs for stages 2 and 3 improves
output third order intercept point (OIP3). Capacitive neutralization enhances stability and
gain. Differential operation also improves immunity to errors in modeling ground-path
impedance and thereby makes stability more predictable than in single-ended mm-Wave
amplifier stages. For gain control, a bank of resistors with series MOS switches sets the
resistive load at the output of stage 2 as well as at both the input and output of stage 3.
A configurable current mirror provides biasing for stage 3, and its digital switches are set
concurrently with the setting of the resistor bank at the output of stage 2. This arrangement
progressively saves d.c. power for lower gain settings without degrading linearity. Overall,
81 dB gain control using this distribution of switched resistors/biasing aims to reduce the
dependences of NF and OIP3 on gain setting.
4.3.3 Phase Shifter
Figure 4.7 shows the schematic, and Fig. 4.4(c) shows the die micrograph of the pas-
sive, bidirectional, 3-bit, differential PS of [33]. The design is fundamentally based on
lumped-element approximation of transmission lines (i.e. true time delay) [37, 93]. As
shown in Fig. 4.7(a), the PS consists of a cascade of passive, switched delay cells. Each of
86
Stage1 Stage2 Stage3
(a)
(b) (c) (d)
Figure 4.6: Schematic of three-stage LNA. (a) Top level block diagram. (b) Stage 1. (c)
Stage 2. (d) Stage 3.
the 45o and 90o cells introduces relative steady-state insertion phase delay between its two
possible switch states, while the the 180o cell inverts the polarity of the differential signal.
The differential 80 
 input and output are each matched to 50 
 single-ended ports using
on-chip baluns only to simplify on-die probing.
The 45o cell [Fig. 4.7(b)] can add significant delay if configured as a lowapass -
network (LP-state: S1 off, S2 on), or instead a MOS switch can bypass the -network so
the cell adds only minimal delay (BP-state: S1 on, S2 off). Unlike the design in [37],
implementing S1 as a triple-well device and tying its gate to the LP-state common-mode
node reduces insertion loss [33].
The 90o cell [Fig. 4.7(c)] can be configured to either add phase delay as a lowpass
87
-network (LP-state), or add phase advance as a highpass -network (HP-state). The
insertion phase difference between LP and HP states varies less with frequency than its
counterpart between LP and BP design [38]. This helps to reduce phase step nonlinearity
errors across a broad frequency range in comparison to an identical PS having a LP-BP
instead of a LP-HP 90o cell [33].
(a)
(b) (c) (d)
Figure 4.7: Schematic of three-bit phase shifter. (a) Block diagram. (b) 45o Cell. (c) 90o
Cell (d) 180o Cell.
4.4 Module Integration
This section explains topology choices and considerations for integrating the three
major components of the FEM (PA, LNA, and PS of Section 4.3) with TR switches at
antenna and PS interfaces.
88
4.4.1 Antenna Interface
G
G
S
AN
T 
I/O
R
T
PA
 P
or
t
LN
A 
Po
rt
PA
 P
or
t
LNA
Mtch NW
Vb,LNA
PA
Mtch NW
VDDλ/4
λ/4
(a)
G
G
S
AN
T 
I/O C A
NT
R
PA
 P
or
t
LN
A 
Po
rt
PA
 P
or
t
LNA
Mtch NW
Vb,LNA
PA
Mtch NW
VDD
T
(b)
AN
T 
I/O G
G
S
C1
PA
Po
rt
LN
A 
Po
rt
V
DD
LNA
Mtch NW
Vb,LNA
C A
NT
C2
T T
(c)
T
Cc
G
G
S
AN
T 
I/O
PA
 P
or
t
LN
A 
Po
rt
LNA
Mtch NW
Vb,LNA
C A
NT
R
T
V
DD
R
(d)
Figure 4.8: Candidate topologies for antenna matching and transmit-receive switch. (a)
Concept of =4 transformer topology used in [3, 4]. (b) Transformer-based multiplexer
topology of [5]. (c) Transformer-based topology in [6]. (d) Proposed topology.
The candidate topologies of Figs. 4.8(a)–(d) are compared in Table 4.2 based on the
following:
 The topology in Fig. 4.8(a) is reported for 60 GHz arrays, e.g. [3, 4]. Its dis-
tributed nature offers wideband matching but its size is accordingly large at 28 GHz;
=4  1.3 mm in SiO2. Electromagnetic (EM) simulations of on-chip 50 
 shielded
coplanar waveguide (CPW) show insertion loss (IL) 0.7 dB/mm, while the shunt
switches must be very wide for their on-resistance to be .2.5–5 
, i.e. to limit IL
contribution. Also, losses are too severe in high- and low-impedance =4 lines so
89
Table 4.2: Comparison of candidate topologies in Fig. 4.8 for antenna matching and
transmit-receive switch.
Topology
Tx
Insertion
Loss
Rx
Insertion
Loss
Matching
Bandwidth
Silicon
Area
Fig.5(a) High High Wideband Bulky
Fig.5(b) High High Narrowband Moderate
Fig.5(c) Low Low Narrowband Compact
Fig.5(d) Low Moderate Wideband Compact
additional matching networks are needed for unequal PA/LNA terminations; further
increasing IL. Finally, the Tx-side switch is exposed to the full PA output swing
and must be designed for reliability instead of Rx-mode IL.
 The topology in Fig. 4.8(b) showed wideband performance in [5], where broadside-
coupled transformers enabled it to be relatively compact. However it is similar to
Fig. 4.8(a) on overall IL due to cascading of matching networks, and also on Tx-
side shunt switch reliability.
 The circuit in Fig. 4.8(c) was reported for an 802.11ac transceiver [6]. It can achieve
lower IL than Figs. 4.8(a)–(b) by avoiding cascaded networks. It also embeds the
TR switch into co-designed Tx/Rx matching, making it very compact, while also
avoiding a shunt switch at the PA output. However, the co-design requires the Tx
balun to present optimal loading (high impedance) to the PA (antenna) in Tx-mode
(Rx-mode). These two requirements are difficult to satisfy across a wide bandwidth
centered on 28 GHz like [2] because explicit CANT  50–80 fF must be used for
dual-resonance matching. CANT increases by C1C2=(C1 + C2)  C1 in Tx-mode,
which detunes the balun if not compensated. C1  C2 by design to reduce Tx-
mode PA-to-LNA coupling / C1=(C2 + C1), and hence reduce Tx IL and protect
the LNA. However, C1 cannot be arbitrarily small to avoid degrading Rx-mode NF
90
due to weak antenna-to-LNA a.c. coupling. One solution is to control CANT and
C1 with more switches; degrading both Tx and Rx IL. Another is to design the
Tx balun to have a single in-band resonance so explicit CANT = 0; sacrificing Tx
bandwidth.
 In this paper, the topology in Fig. 4.8(d) is proposed to benefit from the advantages
of Fig. 4.8(c) while relaxing its bandwidth limitation mentioned above. Also, the PA
gain devices replace the Tx-side shunt switch, eliminating its parasitics and reliabil-
ity concerns. Connecting the PA transistor’s gates to VDD shorts the Tx port, while
grounding the balun center tap avoids ‘crowbar’ current and biases the PA devices
in deep triode to minimize Ron;PA [Fig. 4.9(a)].
Note that the topology of Fig. 4.8(d) trades wider Tx-mode bandwidth for slightly
higher Rx-mode IL. This may be understood using the Rx-mode -equivalent model
shown in Fig. 4.9(b):
 CP1: explicit CANT plus parasitic capacitance of Tx balun at antenna node A.
 LP1: models leakage, i.e. un-coupled inductance of balun on antenna side.
 RP1: models balun losses plus ‘reflected’ Ron;PA in series with antenna.
 CP2: balun parasitic capacitance + off-capacitance of switch Coff;SW at node B.
Wider Tx bandwidth implies greater separation between the Tx balun’s two in-band res-
onances, in turn requiring tighter magnetic coupling k [1, 66], i.e. smaller LP1 / (1  
k2) [65]. On the other hand, low Rx-mode NF requires larger LP1 to separate CP1
and CP2 and counter their step-down effect on <fZout;Bg. Larger LP1 (i.e. lower k)
therefore reduces the impedance transformation required in the LNA matching network
rLNA = <fZin;LNAg=<fZout;Bg, thereby reducing Rx-mode IL [10, 67]. To analyze
LP1’s effect on rLNA in Fig. 4.9(b) (neglecting RP1):
91
Cc
G
G
S
AN
T 
I/O
LN
A 
Po
rt
LNA
Mtch NW
Vb,LNA
C A
NT
VDD
V
DD
Ron,PA
VDD
A
B
(a)
Cc
LN
A 
Po
rt
LNA
Mtch NW
Vb,LNA
CP2
CP1 LP1
RP1
50Ω
A
B
(b)
0.
2
0.
5
1.
0
2.
0
5.
0
+j0.2
-j0.2
+j0.5
-j0.5
+j1.0
-j1.0
+j2.0
-j2.0
+j5.0
-j5.0
0.0 1
Zout,B
k1 k2>k1
s21
s11
Tx-mode
(c)
Figure 4.9: Illustration of trade-off between Tx-mode bandwidth and Rx-mode NF in
design of PA balun for circuit of Fig. 4.8(d). (a) Configuring PA gain devices to replace
shunt switch in Rx-mode; VDD center-tap is pulled down to ground to minimize gain
device’s on-resistance Ron;PA. (b) Simplified -equivalent circuit in Rx-mode. (c) Smith
chart trajectory of output impedance ‘looking back’ at antenna from point B in Rx-mode;
red arrow indicates effect of tighter magnetic coupling in Tx balun.
Zout;B (s) =
RANT + sLP1 (1 + sCP1RANT )
sCP2RANT + (1 + sCP1RANT ) (1 + s2LP1CP2)
; (4.3)
where RANT is the 50 
 antenna resistance. Accordingly, RB , <fZout;Bg is approxi-
mately:
RB  RANT
[1  2!2LP1CP2   2!4R2ANTCP1CP2LP1 (CP1 + CP2)] + !2 (CP1 + CP2)2R2ANT
:
(4.4)
Equation (4.4) reduces to RANT= [1 + !2 (CP1 + CP2)R2ANT ] at LP1 = 0, corresponding
to k = 1; i.e. perfect Tx balun coupling with LP1 being a short circuit, and therefore
CP1 and CP2 sum and appear directly in parallel with RANT such that (4.3) becomes
Zout;B (s)  [1=s (CP1 + CP2)] k RANT . Increasing either CP1 or CP2 reduces RB in this
limiting case, and this logic may be extended to cases where LP1 6= 0 but remains small,
i.e. wide Tx bandwidth (k &0.65–0.8). The Smith chart trajectory of Zout;B at 30 GHz in
Fig. 4.9(c) shows that RB drops as k increases.
WithWith the above insight, the circuit is designed for the ZPA and ZLNA termination
92
targets annotated on the more detailed schematic in Fig. 4.10(a), with its physical layout
shown in Fig. 4.10(b).
 Tx-mode signal path through the circuit of Figs. 4.10(a) and (b) is highlighted in
Figs. 4.11(a) and (b), respectively. TR switchM1 and head-switchM2 are on, while
pull-down switchM3 is off. Coupled inductors L1 and L2 form the wideband (dual-
resonance) Tx balun.
 Rx-mode signal path is highlighted in Figs. 4.11(c) and (d). The PA gain devices
(deep triode) short the Tx balun, withM3 grounding the center tap of L1, whileM1
and M2 are off. A.c. coupling capacitor Cc and inductors Lp and Ls together form
the LNA matching network.
Cc
G
G
S
AN
T 
I/O
PA
 P
or
t
LN
A 
Po
rt
C A
NT
Vb,LNA
R
VDD
T
M1
M3
M2
Ls
Lp
ZLNA=23Ω+87fF
T=R
ZPA=40Ω∥130fF
M1
V
DD
5k
5k
CB
CBk
L2 L1
(a)
SAN
T I
/O
G
G
PA
 Po
rt
LN
A P
ort
L2
L1
LsLp
M1 Cc
CB
CB
(b)
Figure 4.10: Antenna port matching and transmit-receive switch. (a) Schematic. (b) 3D
illustration of physical layout.
The Tx balun and LNA matching network are re-designed relative to their stand-alone
circuit counterparts in [2] and [91], respectively. The LNA matching network is brought
93
Cc
Vb,LNA
Ls
Lp L
NA
 P
or
t
VDD
M2
Tx-Mode
PA St3
M1
VDD
G
G
S
AN
T 
I/O C A
NT
M3
CB
CBk
L2 L1
LN
A 
Po
rt
(a)
SAN
T I
/O
G
G
PA
 Po
rt
LN
A P
ort
L1
L2
M1
(b)
Rx-Mode
VDD
VDD
PA St3
(short cct)
Ldeg
etc...
LNA
St1
Cc
G
G
S
AN
T 
I/O C A
NT
Vb,LNA
VDD
M1
M3
M2
Ls
Lp
CB
CBk
L2 L1 VDD
(c)
SAN
T I
/O
G
G
PA
 Po
rt
LN
A P
ort
L2
LsLp
Cc
Vb,LNA
(d)
Figure 4.11: Illustration of signal path in the circuit of Fig. 4.10 for the Tx/Rx modes. (a)
Schematic in Tx-mode. (b) 3D structure in Tx-mode. (c) Schematic in Rx-mode. (d) 3D
structure in Rx-mode.
close to the antenna port to help reduce Rx-mode IL. The Tx balun comprises a tightly-
coupled core transformer (45o-rotated with kcore 0.75) in addition to a controlled series
leakage contribution to L2; i.e. routing between rotated core transformer and antenna
port. Including routing, an effective k 0.65 is chosen based on the trade-off explained in
Fig. 4.9 and (4.4).
M1 is sized (W=L)1 =4282.4 m/40 nm as a compromise between Tx-mode
IL/Rx-mode NF (via its off-capacitance Coff;SW [Fig. 4.9 and (4.4)]). A deep n-well
94
(DNW) device is used, with a 5 k
 resistor biasing its local bulk to float it at RF and
hence reduce coupling through its junction capacitances to the otherwise low-impedance
shared bulk [55]. Simulations show M1 contributes IL  0.73 dB in Tx-mode for 23–
32 GHz, and has Coff;SW =140 fF after RC-extraction. M2 conducts PA supply current
in Tx-mode, so it must be very wide to reduce its d.c. on-resistance, i.e. the voltage drop
across it. Simulations show that (W=L)2 =64163.42 m/40 nm ( 300 m
) limits
Psat degradation to 0.1 dB.M3 is drawn with (W=L)3 = 4 16 3:42 m/40 nm, and
uses a thick oxide device to minimize drain-to-source leakage in Tx-mode. Finally, two
bypass capacitors CB =20 pF connect symmetrically to L1’s VDD center tap.
Using EM simulation, the entire structure in Fig. 4.10(b) is optimized, i.e. core Tx
balun radius with added output routing length, as well as capacitanceCc and dimensions of
Lp and Ls. EM modeling of the complete structure is necessary to capture various current
return paths that significantly impact parameters of interest, e.g. balance in impedances
loading each of the two PA output stage transistors, effective k of Tx balun, and effective
Lp. Correct ground-referencing of internal ports for CB, M1, Cc, and Vb;LNA is similarly
important for tuning accuracy.
The EM simulation is experimentally verified using Tx- and Rx-mode equivalent pas-
sive test structures, whose die photos are shown in Figs. 4.12(a) and (b), respectively.
Thick metal connections (open circuits) replace on-state (off-state) switches and bypass
capacitors in the implemented test structures. Correspondingly, ideal shorts/opens are
applied across the respective ports of the same EM model to simulate each mode. Fig-
ures 4.13(a) and (b) show the measured and simulated insertion and return losses in both
modes, with good agreement between measurement and simulation except for an extra
0.5 dB vertical offset and visible ripples in the Rx-mode IL response. This discrepancy
was expected; as due to area limitations, only differential (GSGSG-pattern) open/short
impedance standards could be included on the same CMOS chip with the passive antenna
95
interface test circuits. Hence, the open-short de-embedding applied [94], which re-used
the measured impedances of the differential standards, is only approximate for the single-
ended LNA port. Note that the single-ended antenna port did not required de-embedding;
since its I/O pad capacitance of 20 fF is included as part of the design (contributes to
total CANT ).
ANT
G GS
PA
G S G GS
(a)
ANT
G GS
G GS
LNA
(b)
Figure 4.12: Die micrographs of passive test structures for antenna interface. (a) Tx-mode:
LNA port and TR switch shorted. (b) Rx-mode: PA port shorted, TR switch gate tied to
ground.
4.4.2 Phase Shifter Interface
The schematic and 3D layout illustration of the PS interface are shown in Figs. 4.14(a)
and (b), respectively. The PS-side TR switch uses a shunt-series topology [5, 55]. A
DNW is used for series switch Ms to eliminate bulk losses [Section 4.4.1]. The shunt
switch is split into bulk devicesMd andMc, to short differential-mode and common-mode
(including d.c.), respectively.
The PA scales up the impedance level moving backwards from output to input for en-
hanced back-off PAE [2]. Similarly, the LNA uses large load impedances at the transistor
96
RLPA
RLANT
ILPA-ANT
(a)
RLLNA
RLANT
ILANT-LNA
(b)
Figure 4.13: Simulated and measured IL and RL of antenna interface passive test struc-
tures. (a) Tx-mode. (b) Rx-mode
PS
 I/OMs
Ms
PA
 P
or
t
V b
ias
,P
A
R T5k5k
Ms
Ms
T R5k5k
LN
A 
Po
rt
V D
D
kxf
kxf
LcXF
XF
Lkt
LktLkr
Lkr
ZLNA=250Ω∥60fF
ZPA=250Ω∥60fF
ZPS=80Ω
Ms
V
DD
5k
5k
Md
Mc
Mc
(a)
PS
 I/O
PA
 Po
rt
LN
A P
ort
XF
XF
Lc
Lkt
Lkr
Ms, Mc, Md
 switches in this
region
(b)
Figure 4.14: Phase shifter port matching and transmit-receive switch. (a) Schematic. (b)
3D illustration of physical layout.
drains in each stage, limited by IP3 [91]. Thus, the PS interface may match the 80 

input/output impedance of the PS to a relatively high impedance in both Tx and Rx paths;
i.e. the design can be symmetrical. Transformers are preferred for compactness, but their
parasitic capacitances for  1:2.5 turn ratios limit the driving (load) parallel-equivalent
resistance for the PA (LNA) path. A 1:2 ratio is therefore used (smaller winding on PS
side), with equal Tx and Rx termination resistance 250 
. The concept of the Tx balun
97
in Section 4.4.1 is re-used, i.e. a tightly-coupled core XF with intentional series leakage
Lkt to set effective coupling 0.55; limited by the relatively high termination impedances.
The switches terminate the routing as indicated in Fig. 4.14(b) to absorb their parasitics
as tuning capacitances, and EM simulation of the complete structure is used to optimize
dimensions as in Section 4.4.1.
4.5 Experimental Results
The FEM is fabricated in 1P6M 40 nm CMOS LP technology; its die micrograph
is shown in Fig. 4.15, measuring 1.55 mm0.7 mm. All mm-Wave characterization is
performed using on-die probing, and the matching balun indicated in Fig. 4.15 enables
simpler GSG probing at the RFIC I/O port as for the PS in Fig. 4.4(c). This balun con-
tributes 1.2 dB of IL, which is not de-embedded in the results shown in this section (unless
stated). Two multi-contact wedges land on the two rows of d.c. pads on either side of the
FEM to supply bias currents, separate power connections for individual amplifier stages
(e.g. 3 VDD pads for PA), and digital lines for gain/phase control.
LNA
PA
PS
AN
T 
I/O
RF
IC
 I/O
G
G
S
G
G
S
1550um
700um
PS I/O
Balun
Figure 4.15: Die micrograph of fully-integrated front-end module.
98
4.5.1 Transmit Mode
4.5.1.1 Small-signal S-parameters
Figure 4.16(a) shows the measured small-signal S-parameters of the front-end versus
frequency across the 10 available PA gain states, with the data for two PS phase states
f0; 4g being overlaid on the same axes. The plot shows the return loss RL on the PS input
port (s11) is better than 10 dB across 26.9–35.5 GHz, with a peak gain s21 of 11.2 dB, and
reverse isolation s12 40 dB up to 37 GHz. Recall that another 1.2 dB should be added
to all reported gain values to compensate for the IL of the matching balun at the RFIC
I/O port. Also, the gain roll-off with frequency is due to skin effect in transformers (e.g.
see Fig. 4.13(a)), and to transistor maximum available gain (MAG) roll-off with frequency
as explained in [2]. The data for the two overlaid PS states in Fig. 4.16(a) are practically
identical at each PA gain state. This is a result of the input/output matching to the PS
and antenna interfaces being insensitive to the digital gain setting, and to the fact that
states 0 and 4 differ only by a signal inversion in the 180o cell of the PS [81]. Similarly,
Figs. 4.16(b)–(d) plot the corresponding data for the remaining PS states f1; 5; 2; 6; 3; 7g,
showing similar behavior.
Figures 4.17(a)–(d) show the measured errors in the 91 dB PA gain steps that cor-
respond to the data in Fig. 4.16 for the PS states f0; 1; 2; 3g– the remaining 4 PS states
exhibit similar broadband gain step accuracy (not shown due to space limitations). Across
all 8 PS states, the peak gain step nonlinearity error remains < 1
2
 1 LSB (i.e. <0.5 dB)
across 23.2–37.4 GHz.
Figures 4.18(a)–(c) show the measured Tx-mode insertion phase versus frequency
across the 745o PS phase steps, and Figs. 4.19(a)–(c) show the corresponding phase
step nonlinearity errors. An r.m.s. phase error < 10o is achieved across 21.3–33.8 GHz.
99
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
|s
-p
ar
am
et
er
s|
 (d
B
)
-50
-40
-30
-20
-10
0
10
20
Tx 9x1dB Gain Steps @ PS States 0,4
s21
s22
s11
s12
(a)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
|s
-p
ar
am
et
er
s|
 (d
B
)
-50
-40
-30
-20
-10
0
10
20
Tx 9x1dB Gain Steps @ PS States 1,5
s21
s22
s11
s12
(b)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
|s
-p
ar
am
et
er
s|
 (d
B
)
-50
-40
-30
-20
-10
0
10
20
Tx 9x1dB Gain Steps @ PS States 2,6
s21
s22
s11
s12
(c)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
|s
-p
ar
am
et
er
s|
 (d
B
)
-50
-40
-30
-20
-10
0
10
20
Tx 9x1dB Gain Steps @ PS States 3,7
s21
s22
s11
s12
(d)
Figure 4.16: Measured s-parameters in Tx-mode across 91dB PA gain steps for different
PS phase state pairs. (a) States f0; 4g. (b) States f1; 5g. (c) States f2; 6g. (d) States f3; 7g.
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
G
ai
n 
Er
ro
r (
dB
)
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Tx 9x1dB Step Errors @ PS State 0
(a)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
G
ai
n 
Er
ro
r (
dB
)
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Tx 9x1dB Step Errors @ PS State 1
(b)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
G
ai
n 
Er
ro
r (
dB
)
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Tx 9x1dB Step Errors @ PS State 2
(c)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
G
ai
n 
Er
ro
r (
dB
)
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Tx 9x1dB Step Errors @ PS State 3
(d)
Figure 4.17: Measured gain step nonlinearity errors in Tx-mode across 91dB PA gain
steps for different PS phase states; r.m.s. error indicated with thick black line in each case.
(a) State 0. (b) State 1. (c) State 2. (d) State 3.
100
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
s
21
 
(d
eg
)
-1260
-1080
-900
-720
-540
-360
-180
0
Tx 7x45o Phase Steps @ Gain State 0
(a)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
s
21
 
(d
eg
)
-1260
-1080
-900
-720
-540
-360
-180
0
Tx 7x45o Phase Steps @ Gain State 4
(b)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
s
21
 
(d
eg
)
-1260
-1080
-900
-720
-540
-360
-180
0
Tx 7x45o Phase Steps @ Gain State 9
(c)
Figure 4.18: Measured s-parameters in Tx-mode across 745o PS phase steps for different
PA gain settings: (a) PA gain state 0 (min. gain). (b) PA gain state 4. (c) PA gain state 9
(max. gain).
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
Ph
as
e 
Er
ro
r (
de
g)
-25
-20
-15
-10
-5
0
5
10
15
20
25
Error in 7x45o PS Steps @ PA Gain 0
(a)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
Ph
as
e 
Er
ro
r (
de
g)
-25
-20
-15
-10
-5
0
5
10
15
20
25
Error in 7x45o PS Steps @ PA Gain 4
(b)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
Ph
as
e 
Er
ro
r (
de
g)
-25
-20
-15
-10
-5
0
5
10
15
20
25
Error in 7x45o PS Steps @ PA Gain 9
(c)
Figure 4.19: Measured errors in 745o PS phase steps in Tx-mode for different PA gain
settings; r.m.s. error indicated with thick black line in each case. (a) PA gain state 0 (min.
gain). (b) PA gain state 4. (c) PA gain state 9 (max. gain).
4.5.1.2 Large CW Signal Performance
Figures 4.20(a)–(d) show the measured Tx-mode large continuous wave (CW) signal
power sweep results at f27; 28; 29; 30gGHz. The sweeps are performed up to Pin;max=+8 dBm,
at the highest PA gain setting. The FEM is driven to at least 1 dB compression across
26–33 GHz, and to 2–3 dB compression only over 27–30 GHz. Figure 4.21(a) shows
a summary of measured CW signal Pout at key back-off levels across 26–33 GHz, while
Fig. 4.21(b) shows the corresponding measured PAE. Note that Pmax (PAEmax) in Fig. 4.21(a)
(Fig. 4.21(b)) is Pout (PAE) at Pin;max. The input balun’s 1.2 dB IL is not de-embedded.
101
CW P
out @ 27GHz (dBm)
-4 -2 0 2 4 6 8 10 12 14 16
G
ai
n 
(d
B)
7
8
9
10
11
12
13
14
Tx-Mode CW Power Sweep @ 27GHz
PA
E 
(%
)
0
5
10
15
20
25
30
35
Gain PAE
(a)
CW P
out @ 28GHz (dBm)
-4 -2 0 2 4 6 8 10 12 14 16
G
ai
n 
(d
B)
7
8
9
10
11
12
13
14
Tx-Mode CW Power Sweep @ 28GHz
PA
E 
(%
)
0
5
10
15
20
25
30
35
Gain PAE
(b)
CW P
out @ 29GHz (dBm)
-4 -2 0 2 4 6 8 10 12 14 16
G
ai
n 
(d
B)
7
8
9
10
11
12
13
14
Tx-Mode CW Power Sweep @ 29GHz
PA
E 
(%
)
0
5
10
15
20
25
30
35
Gain PAE
(c)
CW P
out @ 30GHz (dBm)
-4 -2 0 2 4 6 8 10 12 14 16
G
ai
n 
(d
B)
7
8
9
10
11
12
13
14
Tx-Mode CW Power Sweep @ 30GHz
PA
E 
(%
)
0
5
10
15
20
25
30
35
Gain PAE
(d)
Figure 4.20: Measured CW power sweep results at maximum PA gain setting and PS phase
state 0 at different CW frequencies. (a) 27GHz. (b) 28GHz. (c) 29GHz. and (d) 30GHz.
4.5.1.3 Modulated Signal Performance
This subsection demonstrates that the implemented UE front-end amplifies even the
extremely broadband, high-PAPR signals anticipated for 5G downlinks with high fidelity.
Measurements are performed for carrier aggregation scenarios across center frequencies
27–32 GHz with 1 GHz step in center frequency. Each component carrier (CC) is 90 MHz-
wide, and 10 MHz guard bands are reported. Each CC carries OFDM having 2048
fast Fourier transform (FFT) points, a 75 kHz tone spacing, and 64-QAM modulation
on each tone. The Pout and PAE for EVM   25 dBc are reported for a single CC
in Figs. 4.22(a) and (c), respectively. Similarly, Pout and PAE for EVM   25 dBc
on each and every CC for eight CCs are shown in Figs. 4.22(b) and (d). Peak perfor-
102
CW Frequency (GHz)
25 26 27 28 29 30 31 32 33 34
CW
 P
o
u
t 
(d
Bm
)
7
8
9
10
11
12
13
14
15
16
P
out @ Key Power Levels
P
max
P1dB Pout @ P1dB-5dB
(a)
CW Frequency (GHz)
25 26 27 28 29 30 31 32 33 34
CW
 P
AE
 (%
)
7.5
10
12.5
15
17.5
20
22.5
25
27.5
PAE @ Key Power Levels
PAE
max
PAE1dB PAE @ P1dB-5dB
(b)
Figure 4.21: Summary of measured CW power sweep results at maximum PA gain setting
and PS phase state 0 versus CW frequency at key power back-off levels. (a)Pout. (b) PAE.
mance of Pout/PAE=6.5dBm/8.8% is demonstrated at 27 GHz for the extremely broad-
band 8100 MHz waveform. Note tha The different traces in each of Figs. 4.22(a)–(d)
correspond to 4 different digital PS phase states (the other 4 states have identical linearity
as they correspond to an inversion in the symmetric 180o-cell [81]). These overlaid curves
highlight that excellent Tx linearity performance is practically independent of PS phase
state (as desired).
4.5.2 Receive Mode
4.5.2.1 Small-signal S-parameters and Noise Figure
The measured small-signal S-parameters of the Rx-mode versus frequency for the 9
gain states of the LNA are shown in Fig. 4.23(a). The input RL at the antenna port (s11) is
better than 9 dB across 23.5–37.9 GHz, while the peak gain (s21) is 16.8 dB, and the reverse
isolation is 40 dB as in the Tx-mode. Mirroring the plots for Tx-mode, the data for the
two PS states f0; 4g are overlaid on the same axes in Fig. 4.23(a), while Figs. 4.23(b)–(d)
show the measured S-parameters for the remaining PS states. Figures 4.24(a)–(d) show
103
Center Frequency (GHz)
26 27 28 29 30 31 32 33
P o
ut
@
 E
VM
 (d
B
m
)
4
4.5
5
5.5
6
6.5
7
64QAM,OFDM EVMCC< -25dBc 1CC
PS State 4
PS State 5
PS State 6
PS State 7
(a)
Center Frequency (GHz)
26 27 28 29 30 31 32 33
PA
E 
@
 E
VM
 (%
)
4
5
6
7
8
9
64QAM,OFDM EVMCC< -25dBc 1CC
PS State 4
PS State 5
PS State 6
PS State 7
(b)
Center Frequency (GHz)
26 27 28 29 30 31 32 33
P o
ut
@
 E
VM
 (d
B
m
)
4
4.5
5
5.5
6
6.5
7
64QAM,OFDM EVMCC< -25dBc 8CC
PS State 4
PS State 5
PS State 6
PS State 7
(c)
Center Frequency (GHz)
26 27 28 29 30 31 32 33
PA
E 
@
 E
VM
 (%
)
4
5
6
7
8
9
64QAM,OFDM EVMCC< -25dBc 8CC
PS State 4
PS State 5
PS State 6
PS State 7
(d)
Figure 4.22: Summary of measured Pout and PAE for carrier aggregation scenarios versus
center frequency for EVM<  25dBc on each CC for different PS digital states. (a) Pout
for 1CC. (b) PAE for 1CC. (c) Pout for 8CC. (d) PAE for 8CC.
the nonlinearity errors in the 81 dB LNA gain steps for the four PS states f0; 1; 2; 3g.
The r.m.s. gain step error is < 0.53 dBr:m:s: across 22–38 GHz.
Figures 4.25(a)–(c) show the measured insertion phase through the front-end module
in Rx-mode versus frequency, and across the 745o PS phase steps. The corresponding
phase step nonlinearity errors are shown in Figs. 4.26(a)–(c); demonstrating that the r.m.s.
phase error remains < 10o is achieved across 21.3–33.2 GHz.
The Rx-mode noise figure of the front-end is measured across the 9 LNA gain states
at PS state 0, and reported in Fig. 4.27 over the 20–40 GHz frequency range. Figure 4.27
shows that the minimum NFRx achieved is 5.5 dB, and it remains below 6.5 dB across
26.4–32.0 GHz. Also, NFRx is relatively insensitive to LNA gain setting, it increases by
a maximum amount of 0.5 dB at minimum gain setting relative to its value at maximum
104
gain over all measured frequencies.
4.5.2.2 Receive-mode Linearity Performance
CW signal input power sweeps are performed to extract the input-referred 1 dB gain
compression point IP1 dB for the minimum and maximum LNA gain states f0; 8g. Sim-
ilarly, two-tone input power sweeps for a tone spacing of f =100 MHz are also per-
formed at LNA gain states f0; 1; 7; 8g to extract the IIP3 of the Rx-mode front-end.
Both sets of large-signal linearity measurements are made for PS phase state 0. Fig-
ure 4.28(a) plots a summary of Rx-mode IP1 dB for CW frequencies 26–33 GHz, showing
a worst-case IP1 dB of 15.9 dBm ( 22.7 dBm) at minimum (maximum) LNA gain. Fig-
ure 4.28(b) shows summary of the IIP3 across center frequencies of the two-tone signal
over the same 26–33 GHz range, with a worst-case value of  8.5 dBm ( 12.9 dBm) at
maximum gain.
105
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
|s
-p
ar
am
et
er
s|
 (d
B
)
-50
-40
-30
-20
-10
0
10
20
Rx 8x1dB Gain Steps @ PS States 0,4
s21
s22
s11
s12
(a)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
|s
-p
ar
am
et
er
s|
 (d
B
)
-50
-40
-30
-20
-10
0
10
20
Rx 8x1dB Gain Steps @ PS States 1,5
s21
s22
s11
s12
(b)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
|s
-p
ar
am
et
er
s|
 (d
B
)
-50
-40
-30
-20
-10
0
10
20
Rx 8x1dB Gain Steps @ PS States 2,6
s21
s22
s11
s12
(c)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
|s
-p
ar
am
et
er
s|
 (d
B
)
-50
-40
-30
-20
-10
0
10
20
Rx 8x1dB Gain Steps @ PS States 3,7
s21
s22
s11
s12
(d)
Figure 4.23: Measured s-parameters in Rx-mode across 81dB LNA gain steps for dif-
ferent PS phase state pairs. (a) States f0; 4g. (b) States f1; 5g. (c) States f2; 6g. (d) States
f3; 7g.
106
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
G
ai
n 
Er
ro
r (
dB
)
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Rx 8x1dB Step Errors @ PS State 0
(a)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
G
ai
n 
Er
ro
r (
dB
)
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Rx 8x1dB Step Errors @ PS State 1
(b)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
G
ai
n 
Er
ro
r (
dB
)
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Rx 8x1dB Step Errors @ PS State 2
(c)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
G
ai
n 
Er
ro
r (
dB
)
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Rx 8x1dB Step Errors @ PS State 3
(d)
Figure 4.24: Measured gain step nonlinearity errors in Rx-mode across 81dB LNA gain
steps for different PS phase states; r.m.s. error indicated with thick black line in each case.
(a) State 0. (b) State 1. (c) State 2. (d) State 3.
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
s
21
 
(d
eg
)
-1260
-1080
-900
-720
-540
-360
-180
0
Rx 7x45o Phase Steps @ Gain State 0
(a)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
s
21
 
(d
eg
)
-1260
-1080
-900
-720
-540
-360
-180
0
Rx 7x45o Phase Steps @ Gain State 3
(b)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
s
21
 
(d
eg
)
-1260
-1080
-900
-720
-540
-360
-180
0
Rx 7x45o Phase Steps @ Gain State 8
(c)
Figure 4.25: Measured s-parameters in Rx-mode across 745o PS phase steps for different
LNA gain settings. (a) LNA gain state 0 (min. gain). (b) LNA gain state 3. (c) LNA gain
state 8 (max. gain).
4.5.3 Performance Comparison
Table 2.5 shows a comparison with state-of-the-art front-ends for 5G in the 28 GHz-
band. We note that references [88,89,95] report a mixture of per-channel performance in a
conducted environment in one hand, and over-the-air performance of their complete pack-
aged antenna arrays in the other hand. All other values in the table used on-die probing,
including this work. References [88, 89, 95] are included for their strong relevance, but
for fair comparison, their self-reported per-channel conducted-test performances are used
107
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
Ph
as
e 
Er
ro
r (
de
g)
-25
-20
-15
-10
-5
0
5
10
15
20
25
Error in 7x45o PS Steps @ LNA Gain 0
(a)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
Ph
as
e 
Er
ro
r (
de
g)
-25
-20
-15
-10
-5
0
5
10
15
20
25
Error in 7x45o PS Steps @ LNA Gain 3
(b)
Frequency (GHz)
20 22 24 26 28 30 32 34 36 38 40
Ph
as
e 
Er
ro
r (
de
g)
-25
-20
-15
-10
-5
0
5
10
15
20
25
Error in 7x45o PS Steps @ LNA Gain 8
(c)
Figure 4.26: Measured errors in 745o PS phase steps in Rx-mode for different LNA gain
settings; r.m.s. error indicated with thick black line in each case. (a) LNA gain state 0
(min. gain). (b) LNA gain state 3. (c) LNA gain state 8 (max. gain).
Figure 4.27: Measured Rx-mode noise figure versus frequency across all LNA gain set-
tings at PS phase state 0.
except if otherwise stated.
Except for the very high precision needed for multiple-beam-capable base station ar-
rays as demonstrated in [96], the r.m.s. gain and phase step nonlinearity errors achieved
in this work are comparable to those in all references in Table 2.5. However, this work
deviates significantly from the overall trend for base station developments in that a lower
PS resolution of 3 bits is implemented, which is tailored closely to UE requirements and
limited by PS insertion loss [Section 4.2.2]. Other metrics specific to Tx-/Rx-mode are
separately compared.
108
(a)
Δ
(b)
Figure 4.28: Summary of measured Rx-mode linearity performance versus frequency. (a)
CW input P1dB results at minimum and maximum LNA gain settings and PS phase state 0
versus CW frequency. (b) Two-tone IIP3 results at LNA gain settings f0; 1; 7; 8g and PS
phase state 0 versus center frequency of two-tone signal.
109
Ta
bl
e
4.
3:
C
om
pa
ri
so
n
w
ith
st
at
e-
of
-t
he
-a
rt
fr
on
t-
en
ds
fo
r2
8
G
H
z
5G
co
m
m
un
ic
at
io
ns
.
Th
is
W
or
k
UC
SD
RF
IC
’17
—
(1
)
AD
I/N
CS
U
RF
IC
’17
—
(2
)
IB
M/
Er
ics
so
n
IS
SC
C’
17
An
ok
iw
av
e
Pr
od
uc
t’1
6
AD
I/N
CS
U
JS
SC
’17
UC
SD
RF
IC
’16
Sa
m
su
ng
IM
S’
16
Te
ch
no
lo
gy
40
nm
 B
ul
k
CM
OS
 L
P
Ja
zz
 S
BC
18
H
Si
Ge
 B
iC
MO
S
13
0n
m
 S
iG
e
Bi
CM
OS
13
0n
m
 S
iG
e
Bi
CM
OS
Si
Ge
Bi
CM
OS
12
0n
m
 S
iG
e
Bi
CM
OS
40
nm
SO
IC
MO
S
0.1
5 𝝁m Ga
As
pH
EM
T
Su
pp
ly 
Vo
lta
ge
 [V
]
1.1
2.2
3.3
No
t s
ta
te
d
1.8
2.5
1.5
5.0
In
te
gr
at
io
n
1-
Ch
 (L
NA
 + 
PA
+ P
S 
+ T
RS
W
)
4-
Ch
/C
hi
p
(L
NA
 + 
PA
 +
 P
S 
+ T
RS
W
)
4-
Ch
 (L
NA
 + 
PA
 +
 
PS
; s
ep
ar
at
e 
Tx
/R
x p
or
ts
)
32
-C
h 
(L
NA
 + 
PA
 + 
PS
 + 
TR
SW
)
4-
Ch
 (L
NA
 + 
PA
 + 
PS
 + 
TR
SW
)
4-
Ch
 (L
NA
 + 
PS
)
1-
Ch
 (L
NA
 
+ P
S)
1-
Ch
 (L
NA
 + 
PA
 + 
TR
SW
)
5G
 A
pp
lic
at
io
n
UE
 T
Rx
UE
 / B
S 
TR
x
BS
 T
Rx
BS
 T
Rx
 
BS
 T
Rx
BS
 R
x
UE
 / B
S 
Rx
UE
 T
Rx
Op
er
at
in
g 
Fe
q.
[G
Hz
]
27
—
30
28
—
32
24
—
28
27
—
29
27
.5—
30
28
—
32
26
—
28
27
.5—
28
.35
Tx
 P
ea
k G
ain
 [d
B]
Rx
Pe
ak
 G
ain
 [d
B]
12
.4†
16
.8†
13
 (P
A-
On
ly)
20
14
.3
11
.5
32
 @
 25
o C
34
 @
 25
o C
24 19
— 11
— 12
.2
22 18
Tx
 G
ain
 C
on
tro
l 
Rx
 G
ain
 C
on
tro
l
9×1dB 8×1dB
14
×1dB
14
×1dB
No
ne
8×1dB 8×1dB
31
×1dB
31
×1dB
No
ne
—
16
×0.4dB
No
ne
Tx
 O
P m
ax
/C
h 
[d
Bm
]
Tx
 P
AE
m
ax
/C
h 
[%
]
15
.8 
@
 27
GH
z
27
.1 
@
 27
GH
z
13
 (P
A-
On
ly 
@
 28
GH
z)
18
 (P
A-
On
ly 
@
 28
GH
z)
12
.5 
to
 17
.5
—
16
.4
22
.1
—
— 32
Tx
 O
P 1
dB
/C
h 
[d
Bm
]
Tx
 P
AE
1d
B/C
h 
[%
]
≥14.6@
 27
—
30
GH
z
≥21.9@
 27
—
30
GH
z
10
.5
13
 (P
A-
On
ly 
@
 28
GH
z)
5.5
 to
 10
.6
—
14 —
9 —
24 —
Tx
 W
av
ef
or
m
RF
BW
 [M
Hz
]
Tx
 E
VM
 [d
Bc
]
Tx
 P
ou
t/C
h 
[d
Bm
]
PA
E 
@
 T
x P
ou
t[%
]
8C
C-
64
QA
M-
OF
DM
80
0 −25 @ 2
7G
Hz
6.5 8.8
1C
C-
16
QA
M
40
0
−22.5 @
 29
GH
zβ
3.5
 (@
 O
P 1
dB
−7dB)
—
Rx
 N
F 
[d
B]
5.5
4.6
4.5
 to
 6.
9
6.0
5.0
5.1
 (n
o 
TR
SW
)
4.0
 to
 4.
7 
(n
o 
TR
SW
)
3.0
Rx
 IP
1d
B[d
Bm
]
Rx
 IIP
3[d
Bm
]
−16 −8.5
−25 to −
18
.4
—
−22.5 —
−16.4 to
 −12.9 −10.4 to
 −6.8
−8 −0.5
PS
 IL
 [d
B]
5.9
†
n/
a
9.3
α
n/
a
—
No
 P
S
PS
 R
es
ol
ut
io
n 
[b
its
]
3
6
5
6
5
4
5
No
 P
S
TR
X 
Ph
as
e
Er
ro
r [
de
g r.
m
.s.
]
≤10 @ 2
1.3
—
33
.2G
Hz
6
4.2
0.6
α
5
5.4
4
No
 P
S
TR
X 
Ga
in
 E
rro
r 
[d
B r
.m
.s.
]
0.8
0.5
0.7
0.5
0.6
0.6
n/
a
Ar
ea
 / C
h 
[m
m
2 ]
0.9
9†
†
2.1
6*
0.5
6*
3.0
6*
No
t s
ta
te
d
0.4
5
1.2
4*
12
.6*
*
≤0.50dB
pk
 @
 23
—
37
GH
z (
Tx
)
≤0.53dB
rm
s @
 22
—
38
GH
z (
Rx
)
R
ef
er
en
ce
s:
R
FI
C
’1
7:
–(
1)
[8
8]
,R
FI
C
’1
7–
(2
):
[9
0]
,I
SS
C
C
’1
7:
[8
9]
,R
FI
C
’1
6:
[9
7]
,J
SS
C
’1
7:
[9
8]
,A
no
ki
w
av
e’
16
:[
95
],
IM
S’
16
:[
99
].
y A
ft
er
de
-e
m
be
dd
in
g
1.
2d
B
I/
O
ba
lu
n
lo
ss
,y
y
w
ith
ou
tI
/O
ba
lu
n,
*E
st
im
at
ed
fr
om
di
e
ph
ot
o,
**
W
ith
Pa
ds
,
fr
om
[1
00
],

in
cl
ud
es
E
V
M
co
nt
ri
bu
tio
n
of
dr
iv
in
g
up
-/
do
w
n-
co
nv
er
te
rc
hi
p.
110
4.5.3.1 Transmit Mode
Note that all of the silicon-based works that integrate the Tx in Table 2.5 use SiGe, and
the majority target base stations. Despite the significantly lower 1.1 V supply voltage used
by this work, the achieved per-channel P1 dB =14.6 dBm is comparable to [89] and higher
than [88,90,95]. We note that the Tx in [90] benefits from separation of Tx output and Rx
input to separate ports/pins (no TR switch), but also suffers significantly due to mis-tuning
of on-chip matching. Also, [99] benefits from higher GaAs breakdown field to produce
P1 dB >20 dBm, which may be needed only for high performance base stations [79].
Finally, the peak 8100 MHz-wide 64-QAM OFDM signal Pout demonstrated at 27 GHz
in this work is 3 dB greater than the peak per-channel 1400 MHz-wide 16-QAM Pout
reported in [88], which can be attributed to the significantly higher per-channel P1 dB, and
the wideband PA [Section 4.3] and antenna interface [Section 4.4.1] designs in this work.
4.5.3.2 Receive Mode
We start with noise figure comparison. References [90, 97, 98] do not integrate the TR
switch, so their noise figure performances should be compared to the 3.3 dB measured
LNA-only noise figure in this work [Table 4.1]. Also, this work achieves comparable
noise figure to [89, 95], i.e. 5–6 dB including insertion loss of integrated TR switch. With
this 5–6 dB mean value in mind, reference [88] achieves an outstanding 4.6 dB. Since
Rx IP1 dB/IIP3 are not reported in [88], it is difficult to conclude if linearity is simul-
taneously upheld by the use of a 2.2 V supply for the back-end VGA, since the active
vector modulator typically exhibits poor linearity. Finally, the GaAs LNA and TR antenna
switch in [99] achieve a lower NF due to the fundamental advantages of higher substrate
resistivity and lower switch losses in compound semiconductors. However, note that no
gain or phase control are integrated into that solution, which implies that the d.c. power
consumption in [99] is likely to increase if the gain and noise figure are maintained after
111
adding gain/phase control functionality in the final solution.
As for linearity (i.e. IP1 dB) comparison, performance of this work is similar to [98],
and better than [89]. The significantly higher supply voltage in [98] enables the higher
IP1 dB therein. Reference [97] significnatly boosts IP1 dB by using a combination of
slightly higher 1.5 V supply voltage as well as alternating amplifier stages with phase-
shifter cells. Note that alternating the phase shift cells with LNA stages means that, unlike
e.g. [79], the PS is no longer bidirectional although its cells are passive. Therefore, larger
area would be needed to integrate a second passive PS for the Tx-mode in a TRx built
from the concept of [97].
Considering the above comparison with the most recent advances in SiGe RFIC design
for base stations, one concludes that this work achieves state-of-the-art performance.
4.6 Conclusion
This Chapter presented the fist fully-integrated high-performance TR front-end in bulk
CMOS targeting 5G handset phased arrays. block-level specifications were first derived
based on system design considerations. The PA, LNA, and PS circuits are integrated with
TR switches in a compact area by using a wideband, all-lumped, and co-designed TR an-
tenna interface topology that was verified experimentally using dedicated test structures.
The presented TR front-end achieves state-of-the-art Tx Pout, linearity, and efficiency, as
well as Rx noise/linearity performances in comparison to recent base station radio devel-
opments by academia/industry in SiGe BiCMOS.
112
5. CONCLUSION
In this dissertation, design methodologies and circuit techniques were demonstrated
to address the integration of key phased array front-end circuits in scaled CMOS. For
proof-of-concept, two PA prototypes were implemented in 28-nm and 40-nm nodes, and
achieved state-of-the-art performance. A low-power fully-integrated TR front-end module
was also implemented and maintained the excellent broadband performance of its con-
stituent circuits through careful signal integrity analysis and design.
The 28 nm PA prototype in this dissertation is the first reported linear, bulk CMOS PA
targeting low-power 5G mobile UE integrated phased array transceivers. The proposed
optimization methodology, and its circuit-level enabler using inductive source degener-
ation were demonstrated to be effective through theoretical as well as very detailed ex-
perimental verification using continuous wave, two-tone, and complex modulated signals.
The prototype was designed and fabricated in 1P7M 28 nm bulk CMOS and achieved
achieves +4.2 dBm/9% measured Pout/PAE at  25 dBc EVM for a 250 MHz-wide, 64-
QAM OFDM signal with 9.6 dB PAPR. At the time of its publication, the 28 nm PA set
the state-of-the-art for high efficiency, linear, broadband 28 GHz-band PAs.
To drastically extend RFBW over that achieved in the first design, and to explore the
use of CMOS technology for the even more challenging downlink data rates, the second
PA design was designed for wideband linearity and implemented in a slower 40 nm pro-
cess. The 40 nm design extended the supportable RFBW by a factor of three over the
state-of-the-art without degrading output power range, battery life, or amplifier fidelity.
The implemented PA used double-tuned transformers that were optimized for linearity,
and integrated 91 dB digital gain control for the first time in any comparable (reported)
high-performance CMOS PA. The prototype was fabricated in a 1P6M 40 nm CMOS LP
113
technology and achieved Pout/PAE of +6.7 dBm/11% for an 8100 MHz carrier aggre-
gation 64-QAM OFDM signal with 9.7 dB PAPR; demonstrating the viability of CMOS
technology to address even the very difficult 5G AP bandwidth requirements.
Finally, leveraging the developed PA design methodologies and circuits, a low power
transmit-receive phased array front-end module is fully integrated in 40 nm technology.
In transmit-mode, the front-end maintains the excellent performance of the 40 nm PA:
achieving +5.5 dBm/9% for the same 8100 MHz carrier aggregation signal above. In
receive-mode, a 5.5 dB noise figure (NF ) and a minimum third-order input intercept point
(IIP3) of  13 dBm are achieved. The performance of the implemented CMOS front-
end is comparable to state-of-the-art publications and commercial products that were very
recently developed in silicon germanium (SiGe) technologies for 5G communication.
5.1 Future Work
Future efforts to follow up on this research are recommended in two main areas:
 Investigating on-chip versus array-based transmit power combining techniques in
the 28 GHz band. The purpose is to enhance the achievable range for high-throughput
scenarios like using 64-QAM OFDM, but without needing to migrate to a more ex-
pensive technology such as SiGe or GaAs. Similar studies of power combining
techniques have been performed in the 60 GHz band, but none have been reported
in the 28 GHz band as of yet, to the best of the author’s knowledge. The study may
benefit greatly fom the detailed link-budget analysis in Chapter 2 as a starting point;
note however that some of the works cited in Chapter 2already included two- or
four-way on-chip power combining in the higher frequency part of the considered
range. A key point in such a study is that overall cost of implementation must be
taken into consideration, since the cost of the silicon RFIC is expected to grow in
size. A cost and performance comparison with architectures that integrate multiple
114
phased array RFICs with a single antenna module is also advised.
 Leveraging the developed PA design methodologies to implement more complex
PA circuit topologies for further back-off PAE enhancement above the limits set by
the simple common-source topology employed so far. For example, Doherty PAs
typically suffer from the problems of AM-PM conversion, and narrowband perfor-
mance. Since the work in Chapter 3 actually tackles AM-PM conversion by using
wideband matching techniques, a Doherty PA using carrier and peaking amplifier
cells based on the techniques in Chapter 3 could result in higher PAE performance
without the AM-PM and narrowband issues of existing Doherty PAs.
115
REFERENCES
[1] S. Shakib, H. C. Park, J. Dunworth, V. Aparin, and K. Entesari, “A highly efficient
and linear power amplifier for 28-ghz 5g phased array radios in 28-nm cmos,” IEEE
Journal of Solid-State Circuits, vol. 51, pp. 1–17, Dec 2016.
[2] S. Shakib, M. Elkholy, J. Dunworth, V. Aparin, and K. Entesari, “2.7 a wide-
band 28ghz power amplifier supporting 8100mhz carrier aggregation for 5g in
40nm cmos,” in 2017 IEEE International Solid-State Circuits Conference (ISSCC),
pp. 44–45, Feb 2017.
[3] M. Uzunkol and G. Rebeiz, “A low-loss 50–70 ghz spdt switch in 90 nm cmos,”
IEEE Journal of Solid-State Circuits, vol. 45, pp. 2003–2007, Oct 2010.
[4] M. Boers, B. Afshar, I. Vassiliou, S. Sarkar, S. T. Nicolson, E. Adabi, B. G. Pe-
rumana, T. Chalvatzis, S. Kavvadias, P. Sen, W. L. Chan, A. H. T. Yu, A. Parsa,
M. Nariman, S. Yoon, A. G. Besoli, C. A. Kyriazidou, G. Zochios, J. A. Castaneda,
T. Sowlati, M. Rofougaran, and A. Rofougaran, “A 16tx/16rx 60 ghz 802.11ad
chipset with single coaxial interface and polarization diversity,” IEEE Journal of
Solid-State Circuits, vol. 49, pp. 3031–3045, Dec 2014.
[5] Y. Wang, H. Wang, C. Hull, and S. Ravid, “A transformer-based broadband front-
end combo in standard cmos,” IEEE Journal of Solid-State Circuits, vol. 47,
pp. 1810–1819, Aug 2012.
[6] T. M. Chen, W. C. Chan, C. C. Lin, J. L. Hsu, W. K. Li, P. A. Wu, Y. L. Huang, Y. C.
Huang, T. Tsai, P. Y. Chang, C. L. Chen, C. H. Tsai, T. Y. Chang, I. C. Huang, W. H.
Chiu, C. H. Liao, C. H. Wu, and G. Chien, “A 22 mimo 802.11 abgn/ac wlan
soc with integrated t/r switch and on-chip pa delivering vht80 256qam 17.5dbm
116
in 55nm cmos,” in 2014 IEEE Radio Frequency Integrated Circuits Symposium,
pp. 225–228, June 2014.
[7] S. Shakib, H. C. Park, J. Dunworth, V. Aparin, and K. Entesari, “20.6 a 28ghz effi-
cient linear power amplifier for 5g phased arrays in 28nm bulk cmos,” in 2016 IEEE
International Solid-State Circuits Conference (ISSCC), pp. 352–353, Jan 2016.
[8] S. Cripps, RF Power Amplifiers for Wireless Communications. Artech House mi-
crowave library, Artech House, 2006.
[9] Y. Tsividis and C. McAndrew, Operation and Modeling of the MOS Transistor.
Oxford Series in Electrical and Computer Engineering, Oxford University Press,
2011.
[10] B. Razavi, RF Microelectronics. Prentice Hall Communications Engineering and
Emerging Technologies, Prentice Hall, 2012.
[11] E. Cohen, S. Ravid, and D. Ritter, “60ghz 45nm pa for linear ofdm signal with
predistortion correction achieving 6.1% pae and  28db evm,” in 2009 IEEE Radio
Frequency Integrated Circuits Symposium, pp. 35–38, June 2009.
[12] B. François and P. Reynaert, “Highly linear fully integrated wideband rf pa for lte-
advanced in 180-nm soi,” IEEE Transactions onMicrowave Theory and Techniques,
vol. 63, pp. 649–658, Feb 2015.
[13] M. Elmala, J. Paramesh, and K. Soumyanath, “A 90-nm cmos doherty power ampli-
fier with minimum am-pm distortion,” IEEE Journal of Solid-State Circuits, vol. 41,
pp. 1323–1332, June 2006.
[14] D. Chowdhury, P. Reynaert, and A. M. Niknejad, “Design considerations for 60 ghz
transformer-coupled cmos power amplifiers,” IEEE Journal of Solid-State Circuits,
vol. 44, pp. 2733–2744, Oct 2009.
117
[15] D. Zhao and P. Reynaert, “A 60-ghz dual-mode class ab power amplifier in 40-nm
cmos,” IEEE Journal of Solid-State Circuits, vol. 48, pp. 2323–2337, Oct 2013.
[16] S. V. Thyagarajan, A. M. Niknejad, and C. D. Hull, “A 60 ghz drain-source neu-
tralized wideband linear power amplifier in 28 nm cmos,” IEEE Transactions on
Circuits and Systems I: Regular Papers, vol. 61, pp. 2253–2262, Aug 2014.
[17] S. Kulkarni and P. Reynaert, “14.3 a push-pull mm-wave power amplifier with <
0:8o; am-pm distortion in 40nm cmos,” in Solid-State Circuits Conference Digest of
Technical Papers (ISSCC), 2014 IEEE International, pp. 252–253, Feb 2014.
[18] D. Zhao and P. Reynaert, “An e-band power amplifier with broadband parallel-series
power combiner in 40-nm cmos,” IEEE Transactions on Microwave Theory and
Techniques, vol. 63, pp. 683–690, Feb 2015.
[19] W. Ye, K. Ma, and K. S. Yeo, “A 2-to-6ghz class-ab power amplifier with 28.4pct
pae in 65nm cmos supporting 256qam,” in 2015 IEEE International Solid-State
Circuits Conference - (ISSCC) Digest of Technical Papers, pp. 1–3, Feb 2015.
[20] A. Larie, E. Kerhervé, B. Martineau, L. Vogt, and D. Belot, “2.10 a 60ghz 28nm
utbb fd-soi cmos reconfigurable power amplifier with 21% pae, 18.2dbm p1db and
74mw pdc,” in Solid-State Circuits Conference Digest of Technical Papers (ISSCC),
2015 IEEE International, pp. 1–3, Feb 2015.
[21] H. Hashemi and S. Raman, mm-Wave Silicon Power Amplifiers and Transmit-
ters. The Cambridge RF and Microwave Engineering Series, Cambridge University
Press, 2016.
[22] B.-W. Min and G. M. Rebeiz, “5–6 ghz spdt switchable balun using cmos transis-
tors,” in 2008 IEEE Radio Frequency Integrated Circuits Symposium, pp. 321–324,
June 2008.
118
[23] H. S. Lee, K. Kim, and B. W. Min, “On-chip t/r switchable balun for 5- to 6-ghz
wlan applications,” IEEE Transactions on Circuits and Systems II: Express Briefs,
vol. 62, pp. 6–10, Jan 2015.
[24] P. Park, D. H. Shin, and C. P. Yue, “High-linearity cmos t/r switch design above
20 ghz using asymmetrical topology and ac-floating bias,” IEEE Transactions on
Microwave Theory and Techniques, vol. 57, pp. 948–956, April 2009.
[25] Z. Li, H. Yoon, F.-J. Huang, and K. K. O, “5.8-ghz cmos t/r switches with high
and low substrate resistances in a 0.18-m cmos process,” IEEE Microwave and
Wireless Components Letters, vol. 13, pp. 1–3, Jan 2003.
[26] B. W. Min and G. M. Rebeiz, “Ka -band low-loss and high-isolation switch de-
sign in 0.13-m cmos,” IEEE Transactions on Microwave Theory and Techniques,
vol. 56, pp. 1364–1371, June 2008.
[27] Y. P. Zhang, J. J. Wang, Q. Li, and X. J. Li, “Antenna-in-package and transmit–
receive switch for single-chip radio transceivers of differential architecture,” IEEE
Transactions on Circuits and Systems I: Regular Papers, vol. 55, pp. 3564–3570,
Dec 2008.
[28] N. A. Talwalkar, C. P. Yue, H. Gan, and S. S. Wong, “Integrated cmos transmit-
receive switch using lc-tuned substrate bias for 2.4-ghz and 5.2-ghz applications,”
IEEE Journal of Solid-State Circuits, vol. 39, pp. 863–870, June 2004.
[29] T. Ohnakado, S. Yamakawa, T. Murakami, A. Furukawa, E. Taniguchi, H. Ueda,
N. Suematsu, and T. Oomori, “21.5-dbm power-handling 5-ghz transmit/receive
cmos switch realized by voltage division effect of stacked transistor configuration
with depletion-layer-extended transistors (dets),” IEEE Journal of Solid-State Cir-
cuits, vol. 39, pp. 577–584, April 2004.
119
[30] Y. Jin and C. Nguyen, “Ultra-compact high-linearity high-power fully integrated
dc–20-ghz 0.18-m cmos t/r switch,” IEEE Transactions on Microwave Theory and
Techniques, vol. 55, pp. 30–36, Jan 2007.
[31] Q. Li, Y. P. Zhang, K. S. Yeo, and W. M. Lim, “16.6- and 28-ghz fully integrated
cmos rf switches with improved body floating,” IEEE Transactions on Microwave
Theory and Techniques, vol. 56, pp. 339–345, Feb 2008.
[32] Z. Li and K. K. O, “15-ghz fully integrated nmos switches in a 0.13- mu;m cmos
process,” IEEE Journal of Solid-State Circuits, vol. 40, pp. 2323–2328, Nov 2005.
[33] M. Elkholy, S. Shakib, J. Dunworth, V. Aparin, and K. Entesari, “A 26–33ghz,
3.3db nf variable gain lna and 6db insertion loss passive 3-bit phase shifter for 5g in
40nm cmos,” in to be submitted to 2017 IEEE Radio Frequency Integrated Circuits
Symposium, pp. 1–4, June 2017.
[34] C. F. Campbell and S. A. Brown, “A compact 5-bit phase-shifter mmic for k-band
satellite communication systems,” IEEE Transactions on Microwave Theory and
Techniques, vol. 48, pp. 2652–2656, Dec 2000.
[35] T. M. Hancock and G. M. Rebeiz, “A 12-ghz sige phase shifter with integrated lna,”
IEEE Transactions on Microwave Theory and Techniques, vol. 53, pp. 977–983,
March 2005.
[36] D.-W. Kang, H. D. Lee, C.-H. Kim, and S. Hong, “Ku-band mmic phase shifter
using a parallel resonator with 0.18-m cmos technology,” IEEE Transactions on
Microwave Theory and Techniques, vol. 54, pp. 294–301, Jan 2006.
[37] B. W.Min and G. M. Rebeiz, “Single-ended and differential ka-band bicmos phased
array front-ends,” IEEE Journal of Solid-State Circuits, vol. 43, pp. 2239–2250, Oct
2008.
120
[38] E. Cohen, C. Jakobson, S. Ravid, and D. Ritter, “A bidirectional tx/rx four-element
phased array at 60 ghz with rf-if conversion block in 90-nm cmos process,” IEEE
Transactions on Microwave Theory and Techniques, vol. 58, pp. 1438–1446, May
2010.
[39] K. Gharibdoust, N. Mousavi, M. Kalantari, M. Moezzi, and A. Medi, “A fully in-
tegrated 0.18-m cmos transceiver chip for x-band phased-array systems,” IEEE
Transactions on Microwave Theory and Techniques, vol. 60, pp. 2192–2202, July
2012.
[40] W. T. Li, Y. C. Chiang, J. H. Tsai, H. Y. Yang, J. H. Cheng, and T. W. Huang,
“60-ghz 5-bit phase shifter with integrated vga phase-error compensation,” IEEE
Transactions on Microwave Theory and Techniques, vol. 61, pp. 1224–1235, March
2013.
[41] S. Shakib, M. Elkholy, J. Dunworth, V. Aparin, and K. Entesari, “A wideband linear
28-ghz power amplifier for power-efficient 5g phased arrays in 40-nm cmos,” to be
submitted to IEEE Transactions on Microwave Theory and Techniques, pp. 1–12.
[42] S. Onoe, “Evolution of 5g mobile technology toward 2020 and beyond,” in 2016
IEEE International Solid-State Circuits Conference (ISSCC), pp. 23–28, Jan 2016.
[43] Z. Pi and F. Khan, “An introduction to millimeter-wave mobile broadband systems,”
IEEE Communications Magazine, vol. 49, pp. 101–107, June 2011.
[44] T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. N. Wong, J. K.
Schulz, M. Samimi, and F. Gutierrez, “Millimeter wave mobile communications for
5g cellular: It will work!,” IEEE Access, vol. 1, pp. 335–349, 2013.
[45] S. Rangan, T. S. Rappaport, and E. Erkip, “Millimeter-wave cellular wireless net-
works: Potentials and challenges,” Proceedings of the IEEE, vol. 102, pp. 366–385,
121
March 2014.
[46] W. Roh, J. Y. Seol, J. Park, B. Lee, J. Lee, Y. Kim, J. Cho, K. Cheun, and F. Aryan-
far, “Millimeter-wave beamforming as an enabling technology for 5g cellular com-
munications: theoretical feasibility and prototype results,” IEEE Communications
Magazine, vol. 52, pp. 106–113, February 2014.
[47] F. Aryanfar, J. Pi, H. Zhou, T. Henige, G. Xu, S. Abu-Surra, D. Psychoudakis, and
F. Khan, “Millimeter-wave base station for mobile broadband communication,” in
2015 IEEE MTT-S International Microwave Symposium, pp. 1–3, May 2015.
[48] F. C. C. N. of Proposed Rulemaking, “Use of spectrum bands above 24 ghz for
mobile radio services,” in GN Docket No. 14-177, October 2015.
[49] A. Sarkar and B. Floyd, “A 28-ghz class-j power amplifier with 18-dbm output
power and 35% peak pae in 120-nm sige bicmos,” in Silicon Monolithic Integrated
Circuits in Rf Systems (SiRF), 2014 IEEE 14th Topical Meeting on, pp. 71–73, Jan
2014.
[50] M. R. Akdeniz, Y. Liu, M. K. Samimi, S. Sun, S. Rangan, T. S. Rappaport, and
E. Erkip, “Millimeter wave channel modeling and cellular capacity evaluation,”
IEEE Journal on Selected Areas in Communications, vol. 32, pp. 1164–1179, June
2014.
[51] W. Heinrich, “The flip-chip approach for millimeter wave packaging,” IEEE Mi-
crowave Magazine, vol. 6, pp. 36–45, Sept 2005.
[52] D. Pozar, Microwave Engineering, 4th Edition. Wiley, 2011.
[53] T. Rappaport, R. Heath, R. Daniels, and J. Murdock, Millimeter Wave Wireless
Communications. Prentice Hall Communications Engineering and Emerging Tech-
nologies Series from Ted Rappaport, Pearson Education, 2014.
122
[54] O. Inac, M. Uzunkol, and G. M. Rebeiz, “45-nm cmos soi technology characteri-
zation for millimeter-wave applications,” IEEE Transactions on Microwave Theory
and Techniques, vol. 62, pp. 1301–1311, June 2014.
[55] X. J. Li and Y. P. Zhang, “Flipping the cmos switch,” IEEE Microwave Magazine,
vol. 11, pp. 86–96, Feb 2010.
[56] S. Monayakul, S. Sinha, C. T. Wang, N. Weimann, F. J. Schmückle, M. Hrobak,
V. Krozer, W. John, L. Weixelbaum, P. Wolter, O. Krüger, and W. Heinrich, “Flip-
chip interconnects for 250 ghz modules,” IEEE Microwave and Wireless Compo-
nents Letters, vol. 25, pp. 358–360, June 2015.
[57] J. Kang, D. Yu, K. Min, and B. Kim, “A ultra-high pae doherty amplifier based
on 0.13-m cmos process,” IEEE Microwave and Wireless Components Letters,
vol. 16, pp. 505–507, Sept 2006.
[58] S. H. L. Tu and S. C. H. Chen, “A 5.25-ghz cmos cascode class-ab power amplifier
for wireless communication,” in Electron Devices and Solid-State Circuits, 2007.
EDSSC 2007. IEEE Conference on, pp. 421–424, Dec 2007.
[59] A. Chakrabarti and H. Krishnaswamy, “High-power high-efficiency class-e-like
stacked mmwave pas in soi and bulk cmos: Theory and implementation,” IEEE
Transactions on Microwave Theory and Techniques, vol. 62, pp. 1686–1704, Aug
2014.
[60] E. Kaymaksut, D. Zhao, and P. Reynaert, “Transformer-based doherty power ampli-
fiers for mm-wave applications in 40-nm cmos,” IEEE Transactions on Microwave
Theory and Techniques, vol. 63, pp. 1186–1192, April 2015.
[61] S. W. Chen, W. Panton, and R. Gilmore, “Effects of nonlinear distortion on cdma
communication systems,” IEEE Transactions on Microwave Theory and Tech-
123
niques, vol. 44, pp. 2743–2750, Dec 1996.
[62] Y. Palaskas, S. S. Taylor, S. Pellerano, I. Rippke, R. Bishop, A. Ravi, H. Lakdawala,
and K. Soumyanath, “A 5-ghz 20-dbm power amplifier with digitally assisted am-
pm correction in a 90-nm cmos process,” IEEE Journal of Solid-State Circuits,
vol. 41, pp. 1757–1763, Aug 2006.
[63] T. Heller, E. Cohen, and E. Socher, “Analysis of cross-coupled common-source
cores for w-band lna design at 28nm cmos,” in Microwaves, Communications, An-
tennas and Electronics Systems (COMCAS), 2013 IEEE International Conference
on, pp. 1–5, Oct 2013.
[64] Y. He, L. Li, and P. Reynaert, “60ghz power amplifier with distributed active trans-
former and local feedback,” in ESSCIRC, 2010 Proceedings of the, pp. 314–317,
Sept 2010.
[65] J. R. Long, “Monolithic transformers for silicon rf ic design,” IEEE Journal of
Solid-State Circuits, vol. 35, pp. 1368–1382, Sept 2000.
[66] J. Hong and M. Lancaster, Microstrip Filters for RF / Microwave Applications.
Wiley Series in Microwave and Optical Engineering, Wiley, 2004.
[67] I. Aoki, S. D. Kee, D. B. Rutledge, and A. Hajimiri, “Distributed active transformer-
a new power-combining and impedance-transformation technique,” IEEE Transac-
tions on Microwave Theory and Techniques, vol. 50, pp. 316–331, Jan 2002.
[68] B. Razavi, “A study of phase noise in cmos oscillators,” IEEE Journal of Solid-State
Circuits, vol. 31, pp. 331–343, Mar 1996.
[69] K. L. Fong and R. G. Meyer, “High-frequency nonlinearity analysis of common-
emitter and differential-pair transconductance stages,” IEEE Journal of Solid-State
Circuits, vol. 33, pp. 548–555, Apr 1998.
124
[70] J. Deng, P. S. Gudem, L. E. Larson, and P. M. Asbeck, “A high average-efficiency
sige hbt power amplifier for wcdma handset applications,” IEEE Transactions on
Microwave Theory and Techniques, vol. 53, pp. 529–537, Feb 2005.
[71] D. Stephens, T. Vanhoucke, and J. J. T. M. Donkers, “Rf reliability of short chan-
nel nmos devices,” in 2009 IEEE Radio Frequency Integrated Circuits Symposium,
pp. 343–346, June 2009.
[72] D. Schreurs, RF Power Amplifier Behavioral Modeling. The Cambridge RF and
Microwave Engineering Series, Cambridge University Press, 2008.
[73] V. Aparin and C. Persico, “Effect of out-of-band terminations on intermodula-
tion distortion in common-emitter circuits,” inMicrowave Symposium Digest, 1999
IEEE MTT-S International, vol. 3, pp. 977–980 vol.3, June 1999.
[74] V. Aparin, “Analysis of cdma signal spectral regrowth and waveform quality,” IEEE
Transactions on Microwave Theory and Techniques, vol. 49, pp. 2306–2314, Dec
2001.
[75] P. Haldi, D. Chowdhury, P. Reynaert, G. Liu, and A. M. Niknejad, “A 5.8 ghz
1 v linear power amplifier using a novel on-chip transformer power combiner in
standard 90 nm cmos,” IEEE Journal of Solid-State Circuits, vol. 43, pp. 1054–
1063, May 2008.
[76] B. Park, D. Jeong, J. Kim, Y. Cho, K. Moon, and B. Kim, “Highly linear cmos
power amplifier for mm-wave applications,” in 2016 IEEE MTT-S International
Microwave Symposium (IMS), pp. 1–3, May 2016.
[77] V. G. T. F. A. I. W. Group, “Verizon 5th generation radio access; test plan for air
interface (release 1).” http://www.5gtf.org/5GTF_Test_Plan_AI_v1p1.pdf, 2016.
125
[78] A. Niknejad and H. Hashemi, mm-Wave Silicon Technology: 60 GHz and Beyond.
Integrated Circuits and Systems, Springer US, 2008.
[79] U. Kodak and G. M. Rebeiz, “Bi-directional flip-chip 28 ghz phased-array core-
chip in 45nm cmos soi for high-efficiency high-linearity 5g systems,” in 2017 IEEE
Radio Frequency Integrated Circuits Symposium (RFIC), pp. 61–64, 2017.
[80] B. W.Min and G. M. Rebeiz, “Single-ended and differential ka-band bicmos phased
array front-ends,” IEEE Journal of Solid-State Circuits, vol. 43, pp. 2239–2250, Oct
2008.
[81] M. Elkholy, S. Shakib, J. Dunworth, V. Aparin, and K. Entesari, “Low loss highly
linear integrated passive phase shifters for 5g on bulk cmos,” submitted to IEEE
Transactions on Microwave Theory and Techniques, pp. 1–9.
[82] K. J. Koh and G. M. Rebeiz, “A q-band four-element phased-array front-end re-
ceiver with integrated wilkinson power combiners in 0.18-m sige bicmos technol-
ogy,” IEEE Transactions on Microwave Theory and Techniques, vol. 56, pp. 2046–
2053, Sept 2008.
[83] T. Yu and G. M. Rebeiz, “A 22–24 ghz 4-element cmos phased array with on-chip
coupling characterization,” IEEE Journal of Solid-State Circuits, vol. 43, pp. 2134–
2143, Sept 2008.
[84] S. Y. Kim and G. M. Rebeiz, “A low-power bicmos 4-element phased array receiver
for 76–84 ghz radars and communication systems,” IEEE Journal of Solid-State
Circuits, vol. 47, pp. 359–367, Feb 2012.
[85] B. W. Min, M. Chang, and G. M. Rebeiz, “Sige t/r modules for ka-band phased
arrays,” in 2007 IEEE Compound Semiconductor Integrated Circuits Symposium,
pp. 1–4, Oct 2007.
126
[86] D. W. Kang, J. G. Kim, B. W. Min, and G. M. Rebeiz, “Single and four-element ka-
band transmit/receive phased-array silicon rfics with 5-bit amplitude and phase con-
trol,” IEEE Transactions on Microwave Theory and Techniques, vol. 57, pp. 3534–
3543, Dec 2009.
[87] C. Y. Kim, D. W. Kang, and G. M. Rebeiz, “A 44–46-ghz 16-element sige bicmos
high-linearity transmit/receive phased array,” IEEE Transactions on Microwave
Theory and Techniques, vol. 60, pp. 730–742, March 2012.
[88] K. Kibaroglu, M. Sayginer, and G. M. Rebeiz, “An ultra low-cost 32-element 28
ghz phased-array transceiver with 41 dbm eirp and 1.0-1.6 gbps 16-qam link at 300
meters,” in 2017 IEEE Radio Frequency Integrated Circuits Symposium (RFIC),
pp. 73–77, 2017.
[89] B. Sadhu, Y. Tousi, J. Hallin, S. Sahl, S. Reynolds, . Renström, K. Sjögren, O. Haa-
palahti, N. Mazor, B. Bokinge, G. Weibull, H. Bengtsson, A. Carlinger, E. West-
esson, J. E. Thillberg, L. Rexberg, M. Yeck, X. Gu, D. Friedman, and A. Valdes-
Garcia, “7.2 a 28ghz 32-element phased-array transceiver ic with concurrent dual
polarized beams and 1.4 degree beam-steering resolution for 5g communication,”
in 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 128–129,
Feb 2017.
[90] Y. S. Yeh, B. Walker, E. Balboni, and B. A. Floyd, “A 28-ghz phased-array
transceiver with series-fed dual-vector distributed beamforming,” in 2017 IEEE Ra-
dio Frequency Integrated Circuits Symposium (RFIC), pp. 65–68, 2017.
[91] M. Elkholy, S. Shakib, J. Dunworth, V. Aparin, and K. Entesari, “A highly linear
wideband variable gain lna with 1db gain steps for 5g using 40nm cmos,” submitted
to IEEE Microwave and Wireless Components Letters, pp. 1–3, 2017.
127
[92] T. Yao, M. Q. Gordon, K. K. W. Tang, K. H. K. Yau, M. T. Yang, P. Schvan, and
S. P. Voinigescu, “Algorithmic design of cmos lnas and pas for 60-ghz radio,” IEEE
Journal of Solid-State Circuits, vol. 42, pp. 1044–1057, May 2007.
[93] C. F. Campbell and S. A. Brown, “A compact 5-bit phase-shifter mmic for k-band
satellite communication systems,” IEEE Transactions on Microwave Theory and
Techniques, vol. 48, pp. 2652–2656, Dec 2000.
[94] H. Cho and D. E. Burk, “A three-step method for the de-embedding of high-
frequency s-parameter measurements,” IEEE Transactions on Electron Devices,
vol. 38, pp. 1371–1375, Jun 1991.
[95] Anokiwave, “Awmf-0108.” http://www.anokiwave.com/specifications/
AWMF-0108.pdf, 2016.
[96] B. Sadhu, Y. Tousi, J. Hallin, S. Sahl, S. Reynolds, . Renström, K. Sjögren, O. Haa-
palahti, N. Mazor, B. Bokinge, G. Weibull, H. Bengtsson, A. Carlinger, E. West-
esson, J. E. Thillberg, L. Rexberg, M. Yeck, X. Gu, D. Friedman, and A. Valdes-
Garcia, “7.2 a 28ghz 32-element phased-array transceiver ic with concurrent dual
polarized beams and 1.4 degree beam-steering resolution for 5g communication,”
in 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 128–129,
Feb 2017.
[97] U. Kodak and G. M. Rebeiz, “A 42mw 26–28 ghz phased-array receive channel
with 12 db gain, 4 db nf and 0 dbm iip3 in 45nm cmos soi,” in 2016 IEEE Radio
Frequency Integrated Circuits Symposium (RFIC), pp. 348–351, May 2016.
[98] Y. S. Yeh, B. Walker, E. Balboni, and B. Floyd, “A 28-ghz phased-array receiver
front end with dual-vector distributed beamforming,” IEEE Journal of Solid-State
Circuits, vol. PP, no. 99, pp. 1–15, 2017.
128
[99] J. Curtis, H. Zhou, and F. Aryanfar, “A fully integrated ka-band front end for
5g transceiver,” in 2016 IEEE MTT-S International Microwave Symposium (IMS),
pp. 1–3, May 2016.
[100] Y. Tousi and A. Valdes-Garcia, “A ka-band digitally-controlled phase shifter with
sub-degree phase precision,” in 2016 IEEE Radio Frequency Integrated Circuits
Symposium (RFIC), pp. 356–359, May 2016.
129
