Algorithms and Circuits for Analog-Digital Hybrid Multibeam Arrays by Beruwawela Pathiranage, Paboda Viduneth A
Florida International University 
FIU Digital Commons 
FIU Electronic Theses and Dissertations University Graduate School 
11-12-2019 
Algorithms and Circuits for Analog-Digital Hybrid Multibeam 
Arrays 
Paboda Viduneth A. Beruwawela Pathiranage 
pberu002@fiu.edu 
Follow this and additional works at: https://digitalcommons.fiu.edu/etd 
 Part of the Digital Circuits Commons, Hardware Systems Commons, Signal Processing Commons, 
Systems and Communications Commons, and the VLSI and Circuits, Embedded and Hardware Systems 
Commons 
Recommended Citation 
Beruwawela Pathiranage, Paboda Viduneth A., "Algorithms and Circuits for Analog-Digital Hybrid 
Multibeam Arrays" (2019). FIU Electronic Theses and Dissertations. 4321. 
https://digitalcommons.fiu.edu/etd/4321 
This work is brought to you for free and open access by the University Graduate School at FIU Digital Commons. It 
has been accepted for inclusion in FIU Electronic Theses and Dissertations by an authorized administrator of FIU 
Digital Commons. For more information, please contact dcc@fiu.edu. 
FLORIDA INTERNATIONAL UNIVERSITY
Miami, Florida
ALGORITHMS AND CIRCUITS FOR ANALOG-DIGITAL HYBRID
MULTIBEAM ARRAYS
A dissertation submitted in partial fulfillment of the
requirements for the degree of
DOCTOR OF PHILOSOPHY
in
ELECTRICAL AND COMPUTER ENGINEERING
by
Paboda Viduneth Ariyarathna, Beruwawela Pathiranage
2019
To: Dean John Volakis
College of Engineering and Computing
This dissertation, written by Paboda Viduneth Ariyarathna, Beruwawela Pathiran-
age, and entitled Algorithms and Circuits for Analog-Digital Hybrid Multibeam
Arrays, having been approved in respect to style and intellectual content, is referred
to you for judgment.
We have read this dissertation and recommend that it be approved.
Jean H. Andrian
Elias Alwan
Pezhman Mardanpour
Arjuna Madanayake, Major Professor
Date of Defense: November 12, 2019
The dissertation of Paboda Viduneth Ariyarathna, Beruwawela Pathiranage is ap-
proved.
Dean John Volakis
College of Engineering and Computing
Florida International University, 2019
ii
Andres G. Gil
Vice President for Research and Economic Development and 
Dean of the University Graduate School
© Copyright 2019 by Paboda Viduneth Ariyarathna, Beruwawela Pathiranage
All rights reserved.
iii
DEDICATION
To my dearest parents and loving wife...
iv
ACKNOWLEDGMENTS
First, I would like to convey my sincere gratitude to my doctoral advisor Dr.
Arjuna Madanayake, for his continuous support, encouragement and motivation
throughout my graduate studies. I was always inspired by his out of the box
thinking and analytical skills.
My gratitude extends to my dissertation committee, Dr. Jean H. Andrian, Dr.
Elias Alwan and Dr. Pezhman Mardanpour for their unreserved help and support.
Also, I would like to thank Dr. Renato Cintra, Dr. Leonid Belostotski, Dr.
Soumyajit Mandal, Dr. Sirani Perera, and Dr. Theodore Rappaport for their
valuable collaborations towards my research which also lead to several
collaborative papers.
Moreover, I’m deeply appreciative to the professors at the Department of
Electrical and Computer Engineering at Florida International University. With
great pleasure I would like to acknowledge all the faculty members at The
University Akron, Ohio and University of Moratuwa, Sri Lanka; also the other
teachers of my life who have contributed for my upbringing.
A special recognition needs to be given to all past and present fellow researchers at
the RF, Analog, and Digital (RAND) laboratory for the support and collaboration
rendered through out. I would like to acknowledge the funding support provided
from the National Science Foundation (NSF) under the award number 1902283
and Defense Advanced Research Projects Agency (DARPA) under agreement
number FA8650-16-1-7629 for my whole tenure as a graduate student.
Above all, my heartfelt gratitude goes to my beloved parents for their unfaltering
efforts to raise me up to the position today I am. I’m also thankful to my wife, two
sisters, and my brother for their unparalleled support and encouragement.
Without my family, I would never be able to pursue doctoral studies.
v
Last but not least, I am indebted to the general public of Sri Lanka, for taking the
burden to sponsor the free-education all the way from grade one up to the
bachelor’s degree that truly paved the way to graduate studies.
vi
ABSTRACT OF THE DISSERTATION
ALGORITHMS AND CIRCUITS FOR ANALOG-DIGITAL HYBRID
MULTIBEAM ARRAYS
by
Paboda Viduneth Ariyarathna, Beruwawela Pathiranage
Florida International University, 2019
Miami, Florida
Professor Arjuna Madanayake, Major Professor
Fifth generation (5G) and beyond wireless communication systems rely heavily on
larger antenna arrays combined with beamforming to mitigate the high free-space
path-loss that prevails in millimeter-wave (mmW) and above frequencies. Sharp
beams that can support wide bandwidths are desired both at transmitter and the
receiver to leverage the glut of bandwidth available at these frequency bands. Fur-
ther, multiple simultaneous sharp beams are also imperative for such systems to
exploit mmW/sub-THz wireless channels using multiple reflected paths simultane-
ously. Therefore, multibeam antenna arrays that can support wider bandwidths are
a key enabler for 5G and beyond systems.
In general, N -beam systems using N -element antenna arrays will involve circuit
complexities of the order of N2. This dissertation investigates new analog, digi-
tal and hybrid low complexity multibeam beamforming algorithms and circuits for
reducing the associated high size, weight, and power (SWaP) in larger multibeam
arrays. The research efforts on the digital beamforming aspect propose a new class
of multibeam algorithms based on discrete Fourier transform (DFT) approximations
that eliminate the need of digital multipliers in the beamforming circuitry. For this,
8-, 16- and 32-beam multiplierless multibeam algorithms have been proposed for
uniform linear array applications. A 2.4 GHz 16-element array receiver setup and a
vii
5.8 GHz 32-element array receiver system which use field programmable gate arrays
(FPGAs) as digital backend have been built for real-time experimental verification
of the digital multiplierless algorithms. The multiplierless algorithms have been
experimentally verified by digitally measuring beams. It has been shown that the
measured beams from the multiplierless algorithms are in well agreement with the
exact counterpart algorithms.
Analog realizations of the proposed approximate DFT transforms have also been
investigated leading to low-complex, high bandwidth circuits in CMOS. Further, a
novel approach for reducing the circuit complexity of analog true-time delay (TTD)
N -beam beamforming networks using N -element arrays has been proposed for wide-
band squint-free operation. A sparse factorization of the N -beam delay Vander-
monde beamforming matrix is used to reduce the total amount of TTD elements
that are needed for obtaining N number of beams in a wideband array. The wide-
band squint-free multibeam algorithm is also used to propose a new low-complexity
hybrid beamforming architecture targeting future 5G mmW systems. Apart from
that, the dissertation also explores multibeam beamforming architectures for uni-
form circular arrays (UCAs). An algorithm having N logN circuit complexity for
simultaneous generation of N -beams in an N -element UCA is explored and verified.
——————
viii
TABLE OF CONTENTS
CHAPTER PAGE
1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Contributions of this Dissertation . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Scientific Collaborators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2. REVIEW OF RF BEAMFORMING THEORY AND TECHNIQUES . . . 16
2.1 Free Space Propagation of EM Waves . . . . . . . . . . . . . . . . . . . . 16
2.2 Spectral Properties of 2D/3D Spatio-Temporal PWs . . . . . . . . . . . 19
2.3 3D/2D ST PW Signals Received by Arrays of Antennas . . . . . . . . . . 22
2.4 Spatial Filtering of ST PWs viz. Beamforming . . . . . . . . . . . . . . . 26
2.5 Different Spatial Filtering Approaches for ST Array Processing . . . . . . 28
2.6 Formation of Multiple Simultaneous Beams . . . . . . . . . . . . . . . . 35
3. LOW-COMPLEXITY DIGITAL RF MULTIBEAMS USING MULTIPLI-
ERLESS APPROXIMATE DISCRETE FOURIER TRANSFORMS . . . 38
3.1 Multibeams and the Role of Fully Digital Beamforming . . . . . . . . . . 38
3.2 Spatial DFT based Multibeams . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Discrete Transform Implementation Methods . . . . . . . . . . . . . . . . 44
3.4 DFT Approximations through Parameterization . . . . . . . . . . . . . . 46
3.5 8- and 16-Point Approximate Transforms and Beam Response Analysis . 49
3.5.1 8-Point ADFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5.2 16-Point ADFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5.3 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5.4 Digital Implementation of 16-point ADFT . . . . . . . . . . . . . . . . 58
3.6 Experimental Verification of the Low-Complexity Beams using ADFTs . 59
3.6.1 2.4 GHz Front-End Antenna Array . . . . . . . . . . . . . . . . . . . . 59
3.6.2 Microwave Front-End . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.6.3 Baseband Digital Processing Hardware and Circuits . . . . . . . . . . . 61
3.6.4 Experimental Setup and Beam Measurements . . . . . . . . . . . . . . 66
3.6.5 Measured Beams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4. LOW-COMPLEXITY 32-BEAM MULTIBEAM SYSTEM: BUILDING
BLOCK FOR A MULTIPLIERLESS 1024-BEAM DIGITAL ARRAY . . 78
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2 2D DFT based Multibeam Transceivers . . . . . . . . . . . . . . . . . . . 79
4.3 A 32-point DFT Approximation and Fast Algorithm for RF Beamforming 82
4.3.1 32-point Approximate DFT . . . . . . . . . . . . . . . . . . . . . . . . 82
ix
4.3.2 Fast Algorithm for Computing the 32-point ADFT . . . . . . . . . . . 85
4.3.3 Hardware Metrics of the Proposed ADFT Realization . . . . . . . . . . 89
4.3.4 N -Beam Beamforming Architectures for ULAs and URAs . . . . . . . 91
4.4 A 32-Beam ULA-based Multibeam Beamformer . . . . . . . . . . . . . . 92
4.4.1 5.8 GHz Front-End Design . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.4.2 Digital Back-End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.5.1 Antenna Array Characterization . . . . . . . . . . . . . . . . . . . . . 95
4.5.2 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.5.3 Beam Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5. DIGITAL BEAMFORMING AT 28 GHZ USING XILINX RFSOC PLAT-
FORM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1 28 GHz Receiver Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1.1 Antenna Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.1.2 Receivers and Front-End . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2 Digital Back-End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2.1 RF ADCs on the Xilinx RFSoC . . . . . . . . . . . . . . . . . . . . . . 109
5.2.2 Digital Beamforming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3 Real-Time Beam Measurements Setup . . . . . . . . . . . . . . . . . . . 112
5.3.1 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.3.2 Real-Time Beam Measurements . . . . . . . . . . . . . . . . . . . . . . 113
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6. ANALOG 16-BEAM BEAMFORMING ARCHITECTURE AND THE
CMOS CIRCUITS USING APPROXIMATE DFT . . . . . . . . . . . . . 117
6.1 Introduction and Review of RF System Considerations . . . . . . . . . . 117
6.2 Spatial FFT-based Multi-Beam Architectures . . . . . . . . . . . . . . . 119
6.3 Circuit Topologies, Beamforming Architectures . . . . . . . . . . . . . . 121
6.3.1 Analog Current-Mode ADFT Designs . . . . . . . . . . . . . . . . . . . 123
6.3.2 16-point ADFT Implementation in 65 nm CMOS . . . . . . . . . . . . 124
6.3.3 Simulated Beams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.3.4 Design of V-I/I-V Converter Circuits . . . . . . . . . . . . . . . . . . . 130
6.4 Comparison with a Baseline Digital Implementation . . . . . . . . . . . . 132
6.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 134
7. ANALOG LOW-COMPLEXITY SQUINT-FREE WIDEBAND MULTI-
BEAM NETWORKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.1 The Problem of Beam Squint . . . . . . . . . . . . . . . . . . . . . . . . 137
7.2 Analog Multibeam Beamformers . . . . . . . . . . . . . . . . . . . . . . 138
7.3 Analog RF Squint-Free N Beam System Model . . . . . . . . . . . . . . 140
7.4 Factorized DVM Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 142
x
7.5 4-Element Array Example using the Factorization . . . . . . . . . . . . . 144
7.6 Simulated Beams using Measured APF Data . . . . . . . . . . . . . . . . 148
7.6.1 Second-Order Current-Mode CMOS All-pass Filter . . . . . . . . . . . 149
7.6.2 Simulated 5-Beams using Measured APF Data . . . . . . . . . . . . . 151
7.7 Using Factorized DVM for Achieving Wideband Multibeams at IF . . . . 152
7.7.1 Proposed TTD IF Multi-Beamformer Model . . . . . . . . . . . . . . . 153
7.7.2 All-Pass-Filter Based 4-Beam Analog IF Beamformer . . . . . . . . . . 155
7.7.3 Extending the 4-Beam Algorithm to Generate Simultaneous 9-Beams . 157
7.8 Simulated Beams Using Ideal APF Responses . . . . . . . . . . . . . . . 158
7.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8. HYBRID BEAMFORMING ARCHITECTURES FOR SQUINT-FREE OP-
ERATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.1 Hybrid Beamforming Systems . . . . . . . . . . . . . . . . . . . . . . . . 162
8.2 Circuit Design for Multi-Beam Realization . . . . . . . . . . . . . . . . . 165
8.3 Level-1 Analog Multi-Beam Beamformer . . . . . . . . . . . . . . . . . . 166
8.3.1 28-GHz Current-Mode CMOS All-Pass Filter . . . . . . . . . . . . . . 167
8.3.2 Simulated 4- and 8-Beam Networks using 28 GHz APF Measured Re-
sponses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.3.3 Analysis of Variations in Beams due to Circuit Imperfections . . . . . . 172
8.4 Level-2 Digital Beamforming . . . . . . . . . . . . . . . . . . . . . . . . . 172
8.4.1 Thiran APFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.4.2 Digital Implementation of the Thiran Filter based Beamforming . . . . 175
8.4.3 Analog Digital Hybrid Simulations . . . . . . . . . . . . . . . . . . . . 179
8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
9. GENERATION OF N SIMULTANEOUS BEAMS AT O(N logN) COM-
PLEXITY IN UNIFORM CIRCULAR ARRAYS . . . . . . . . . . . . . . 181
9.1 Review of Array Factors of Circular Arrays . . . . . . . . . . . . . . . . . 182
9.2 Generation of N Beams using a UCA . . . . . . . . . . . . . . . . . . . . 184
9.2.1 Proposed N -Beam Algorithm . . . . . . . . . . . . . . . . . . . . . . . 185
9.2.2 Complexity Analysis of the Proposed N -Beam Algorithm . . . . . . . . 186
9.3 Proposed Hardware Realization Architectures . . . . . . . . . . . . . . . 187
9.3.1 RF Analog Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 187
9.3.2 Digital Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
9.4 Simulated N -Beam Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 190
9.5 Real-Time Experimental Verification of the Proposed Algorithm . . . . . 192
9.5.1 16-Element UCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
9.5.2 RF Receivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
9.5.3 Digital Back-End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
9.5.4 Measurement Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
9.6 Mitigating Mutual Coupling in UCAs for Multibeam Generation . . . . . 199
9.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
xi
10. CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . 204
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
xii
LIST OF FIGURES
FIGURE PAGE
1.1 Applications of multibeam beamforming; (a) fifth generation (5G) wire-
less communication environment; (b) electronic warfare; (c) satellite
based Internet access; (d) radio astronomy. . . . . . . . . . . . . . . 2
1.2 Examples of highly directional beam-like propagation in mmW wireless
channels, which can suffer from obstructions in dynamically chang-
ing mobile environments and how the use of multi-beam arrays can
overcome such scenarios. . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Signals emanated from a source can be treated as plane waves in the
far-field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 2D temporal snapshot of the 3D plane wave function created using 1D
to 3D mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 (a) Plane wave received on the x-axis; (b) the two-dimensional (2D)
spatio-temporal direction of arrival (DoA) in the {x, ct} domain. . . 19
2.4 Different spatial frequencies observed by x-axis for a sinusoidal wave
front impinging from different directions. . . . . . . . . . . . . . . . 20
2.5 (a) The RoS of a 3D ST PW received by a planar surface in 3D space
(z = 0) and its corresponding RoS in the 3D frequency domain; (b)
the RoS of a 2D ST PW received by a line in 3D space (z = 0, y = 0). 21
2.6 Illustration of 2D sinusoidal wave form and its corresponding 2D fre-
quency spectrum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.7 (a) 2D antenna array on a fighter jet nose; (b) circular antenna array in-
side a Wi-Fi access-point; (c) 64-element 28 GHz array developed by
IBM and Ericsson [1]; (d) The National Radio Astronomy Observa-
tory’s Very Large Array in New Mexico [2] (e) Precision Acquisition
Vehicle Entry Phased Array Warning System consisting of a crossed
dipole element antenna array located at US Clear Air Force Base,
Alaska. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.8 The top-view of the ROS of a 3D ST PW signals. . . . . . . . . . . . . . 25
2.9 Cross section through ωy = 0 plane in the space-time frequency domain. 26
2.10 Frequency spectra of the 2D ST BP signal which is received by a ULA;
(b) the downconverted spectrum; (c) spatial filtering to the received
multi-dimensional signal. . . . . . . . . . . . . . . . . . . . . . . . . 28
2.11 Beamforming topologies based on the hardware implementation; (a)
analog beamforming, (b) digital beamforming, and (c) hybrid beam-
forming architectures. . . . . . . . . . . . . . . . . . . . . . . . . . . 29
xiii
2.12 Receive mode model of an N -element phase array. . . . . . . . . . . . . 30
2.13 True-time delay-and-sum beamforming in a receive-mode N -element
phase array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.14 (a) Digital filter-and-sum beamforming architecture; (b) FFT-based fre-
quency domain wideband beamforming in digital. . . . . . . . . . . 33
2.15 Digital architecture of a Frost beamformer. . . . . . . . . . . . . . . . . 34
2.16 Overview of analog, digital and hybrid multibeam architectures. . . . . . 36
3.1 Overview architecture of DFT based multibeam RF aperture employing
a uniform linear array (ULA) . . . . . . . . . . . . . . . . . . . . . . 42
3.2 (a) Responses of a 8-point DFT filterbank; (b) illustration of the filtering
of baseband 2D PW using a DFT filter response. . . . . . . . . . . . 43
3.3 Signal flow graph of 8-point a-DFT. . . . . . . . . . . . . . . . . . . . . 51
3.4 (a) Exact DFT beams; (b) beams obtained using the 8-point ADFT; (c) error
between the two transforms. . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5 Frequency response comparison of the filterbanks of the proposed ap-
proximation and the DFT. The x-axis is the normalized frequency
and y-axis corresponds to the magnitude in dB. . . . . . . . . . . . . 53
3.6 The SFG for the fast algorithm of the proposed DFT approximation. . . 56
3.7 Overall system architecture of the 2.4 GHz array-receiver setup. . . . . . 60
3.8 (a) Measured |S11| of a single patch antenna; (b) fabricated 2.4 GHz
patch antenna element with the integrated LNA; (c) measured power
patterns of each antenna element in the array. (d) full 16-element
antenna array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.9 (a) RF receiver chains containing a bandpass filter, low-noise amplifier,
splitter, mixers, low-pass filter, and an IF amplifier. (b) ROACH-2
platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.10 Architecture of the digital back-end that generates N -beams (N =
8, 16). The figure also shows the energy calculation circuits used
for measuring the beam patterns in digital. . . . . . . . . . . . . . . 62
3.11 Experimental setup: (a) transmitter and receiver in the anechoic cham-
ber, (b) reciever instrumentation setup including the RF receivers,
(c) front-view of the antenna array, (d) rotation platform. . . . . . . 65
3.12 Lock-in amplifier design made for generating the transmitted signal with
on-off keying.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
xiv
3.13 Measured and simulated beam patterns for each bin of 8-point approx-
imate and exact transforms. . . . . . . . . . . . . . . . . . . . . . . . 69
3.14 (a) All beam patterns using the approximate transform from the raw
values measured at each bin output, (b) The normalized beam pat-
terns in the log domain for the approximate transform, (c) All beam
patterns obtained using the the exact FFT core, (d) The normalized
patterns of (c) in the log domain. . . . . . . . . . . . . . . . . . . . . 70
3.15 Measured beam patterns for bins 0-7 of 16-point approximate and exact
transforms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.16 Measured beam patterns for bins 8-15 of 16-point approximate and exact
transforms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.17 (a) All beam patterns drawn in one plot using the 16-point approxi-
mate transform from the raw measured values at each bin’s output,
(b) the normalized beam patterns (log domain) for the approximate
transform, (c) all beam patterns obtained using the the exact FFT
core, (d) normalized patterns of (c) in the log domain. . . . . . . . . 75
4.1 (a) Digital beamforming architecture for obtaining N2 beams using an
N×N URA. (b) Block diagram of a N -element sub-system that acts
as a building block for the N2 rectangular aperture array. The block
named HT in the figure denotes the Hilbert transform operation. . . 80
4.2 The simulated frequency responses of the 32 output bins of the (a) pro-
posed 32-point ADFT, (b) exact DFT; (c) the magnitude error of
the two responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3 Bins that have the highest magnitude error in Fig. 4.2. (c). . . . . . . . 86
4.4 (i) Simulated polar patterns of the 32-beams for a ULA with λ/2 ele-
ment spacing. (i-a) Beams corresponding to the ADFT, (i-b) beams
obtained with the ideal FFT, and (i-c) the magnitude error between
the ADFT and the exact FFT. (ii) Example simulated beam pat-
terns from a Nyquist-spaced URA; (a) ψ = 8.0◦, φ = −153.4◦, (b)
ψ = 45.4◦, φ = −142.1◦,(c) ψ = 26.2◦, φ = 45.0◦ (the plots are color-
coded on a dB scale). . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.5 (a) Overall architecture of the test setup; (b) 5.8 GHz 32-beam array
receiver setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.6 The beam outputs at bins 0-7. . . . . . . . . . . . . . . . . . . . . . . . 97
4.7 The beam outputs at bins 8-15. . . . . . . . . . . . . . . . . . . . . . . 98
4.8 The beam outputs at bins 16-23. . . . . . . . . . . . . . . . . . . . . . 99
4.9 The beam outputs at bins 24-31. . . . . . . . . . . . . . . . . . . . . . 100
xv
4.10 The impact of the measurement setup geometry on the measured beam
response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.11 2D beam patterns computed from 1-D array beam measurements using
the ADFT algorithm. The beams correspond to the bin outputs
(same angles) as the beams shown in Fig. 4.4(a-c). . . . . . . . . . . 103
5.1 The overall architecture of the 4-element 28 GHz digital beamforming
receiver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.2 The 4-element ULA consisting of series fed sub-array patches at 28 GHz. 107
5.3 The 4-element receiver array front-end having the antenna array and
the receivers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4 (a) Xilinx ZCU 1275 platform incorporating Zynq UltraScale+
XCZU29DR chip; (b) custom-made balun boards for interfacing the
ADCs; (c) HW-CLK-102 clock generation board. . . . . . . . . . . . 109
5.5 RF ADC tile overview (figure is taken from [3]). . . . . . . . . . . . . . 110
5.6 The overview architecture of the digital back-end. . . . . . . . . . . . . 111
5.7 The entire measurement setup comprising the 28 GHz receiver array and
the 28 GHz transmitter. . . . . . . . . . . . . . . . . . . . . . . . . 112
5.8 (a) The IQ outputs of the 4 downconverted channels at an IF frequency
of 100 MHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.9 (a-d) Simulated and measured beam corresponding to each output of
the FFT at fIF = 100 MHz. (e) All simulated beams in a single
plot. (f) All the measured beams in a single plot. . . . . . . . . . . 115
5.10 (a-d) Measured beams corresponding to the outputs of the FFT bins at
fRF ∈ {27.5, 27.7, 28, 28.2, 28.3} GHz. . . . . . . . . . . . . . . . 116
6.1 A representative 28-GHz 5G half-duplex transceiver architecture based
on delay-and-sum beamforming [4]. . . . . . . . . . . . . . . . . . . . 118
6.2 (a) Receive-mode multi-beam system with down-converted analog base-
band beamforming; (b) transmit-mode multi-beam system using
baseband analog beamforming. . . . . . . . . . . . . . . . . . . . . . 120
6.3 Current-mode implementation of (a) addition and (b) subtraction oper-
ations, which are the primary functions for implementing the ADFT
using analog CMOS circuits; (c) NMOS and (d) PMOS current mir-
rors designed using a low-voltage cascode topology. . . . . . . . . . 121
6.4 (a) System architecture of the 16-point analog ADFT; (b) realization of
the B5 factorization stage using current mirros. . . . . . . . . . . . 124
xvi
6.5 Beam outputs generated from Cadence Spectre simulations for output
bins 0-7 of the 16-point analog ADFT design. Each sub-figure shows
beam patterns for different IF bandwidths from Cadence and the
simulated from MATLAB. . . . . . . . . . . . . . . . . . . . . . . . . 127
6.6 Beam outputs generated from Cadence Spectre simulations for output
bins 8-15 of the 16-point analog ADFT design. . . . . . . . . . . . . 128
6.7 (a)The proposed V-I converter circuit. (b) Simulated input reflection co-
efficient |S11|. (c) Simulated total harmonic distortion (THD)versus
input amplitude. (f) Simulated input-referred noise power spectral
density (PSD) over the frequency range from 1 MHz to 10 GHz. . . 129
7.1 (a) 2D frequency response of a typical complex-weighted phased array
beam; (b) its array factor at different temporal frequencies showing
squint; (c) beam realized using true-time-delays, and (d) its squint-
free array factors at different frequencies. . . . . . . . . . . . . . . . 137
7.2 An algorithm that enables low-complexity realization of analog true-
time-delay (TTD) N beam networks is desired for future wide-
band systems that demands higher number of multiple simultaneous
beams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.3 SFG of the proposed 4-point DVM factorization algorithms. . . . . . . 148
7.4 (a) complementary metal–oxide–semiconductor (CMOS) all-pass circuit
and the (b) die micrograph of which the measured responses are
taken. (c-d) measured magnitude and phase of the APF chip (this
chip was designed and developed by Dr. L. Belostotski at University
of Calgary, Canada) . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.5 2-D frequency response of (a-e) beams corresponding to
Y [0], Y [1], Y [2], Y [3] and Y [4]; (f-h) corresponding array patters for
temporal frequency values 2.4, 2.0, and 1.6 GHz. . . . . . . . . . . 150
7.6 Comparison of the array factors corresponding to beam no. 1 (a) using
ideal TTD and (b) measured APF data. . . . . . . . . . . . . . . . . 151
7.7 Overview architecture of a wideband N -beam array at intermediate-
frequency (IF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.8 (a) Overview of the system architecture of the 9 beam multi-
beamformer; (b) signal flow graph for realizing 4 beams having look
directions at sin−1 l
4
, l ∈ {1, 2, 3, 4}; (c) analog-IC realization of
the α(s) block using APFs. . . . . . . . . . . . . . . . . . . . . . . . 155
7.9 Overview of the system architecture of the 9-beam multi-beamformer. . 157
xvii
7.10 Simulated 2-D frequency response of (a-e) beams in the IF stage cor-
responding to beam 0 to beam 4 respectively using ideal APF re-
sponses; (f-h) corresponding array patterns for temporal frequencies
56, 61 and 65 GHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
8.1 System architecture of the proposed hybrid squint-free multi-beam net-
work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
8.2 (a) High-level block diagram of the overall RF front-end architecture,
(b) a more detailed block diagram of a single RF chain, (c) simplified
schematic of the proposed LNA. . . . . . . . . . . . . . . . . . . . . 165
8.3 (a) APF schematic; its (b) gain and (c) phase profile. . . . . . . . . . . 167
8.4 Simulated beam responses resulting form the SFG in Fig. 7.3 us-
ing the 28 GHz APF simulated frequency responses at frequen-
cies {26.4, 27.6, 29.7} GHz. Beams (a-d) have look-directions
11.1◦, 22.6◦, 35.2◦, and 50.2◦, respectively. . . . . . . . . . . . . . . 169
8.5 signal flow graph (SFG) for the proposed low-complexity 8-beam wide-
band beamformer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
8.6 Monte-Carlo simulations for each beam of the N = 4 network (50 runs).
The red curve represents the beam pattern for the nominal value of
APF gain (unity). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
8.7 Wideband digital beamforming using FIR fractional delay approxima-
tion filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.8 Proposed wideband delay-sum beamforming architecture using Thiran
all-pass fractional delay filters. . . . . . . . . . . . . . . . . . . . . . 176
8.9 (a) 2D frequency response of the Thiran filter based 32-element beam-
former where beam angle is set to 40◦; (b) magnitude response in
(a) of the beamformers along the line shaped passband against the
temporal frequency; (c) the array factors of the Thiran filter based
simulated beamformer tuned at 40◦ at different temporal frquency
values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.10 (i-a) 2-D magnitude frequency domain plots of Hl(e
jωx ,Ωt) for l =
2, 4, 6, and 8 generated by the proposed 8-beam wideband beam-
former assuming ideal time delays, and (i-b) the corresponding AFs.
(i-c) and (i-d): Same as (i-a) and (i-b) but using simulated APF
data. 2-D magnitudes (ii-a) 2-D magnitude frequency domain plot
of the digital filter transfer function tuned to θ5 = 28.7
◦; (ii-b) AF
of the DBF in (ii-a) for different IF frequencies; (ii-c) AF of the 5th
beam of the ABF; (ii-d) composite AF obtained by combing the ABF
and DBF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.1 Circular array signal model. . . . . . . . . . . . . . . . . . . . . . . . . 182
xviii
9.2 (a) 3D Beam pattern generated to point the beam at φ = 30◦ and
θ = 30◦ by setting the array weights as given in (9.3). (b) Elevation
plane pattern at φ = 30◦; (c) azimuthal plane pattern at θ = 30◦. . . 183
9.3 (a) Analog-RF architecture for realizing the proposed multibeam beam-
forming system. (b) Digital baseband architecture for realizing the
proposed algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
9.4 Digital circuits for realizing the N -beams using a UCA. . . . . . . . . . 189
9.5 Simulated circular beams resulting from the proposed N -beam algo-
rithm. Simulation assumes N = 16 and the beams are equally spaced
at 22.5◦ in the azimuthal plane (only even indexed 8 beams are shown
here out of 16). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
9.6 Comparison of array-factors corresponding to fixed-point digital core
computed beams and Matlab floating-point simulated beams. The
x-axis represents φ [◦] and y-axis correspond to the beamforming
gain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
9.7 The overview of architecture of the 2.4 GHz digital circular array re-
ceiver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
9.8 (a) The receiver array consisting of a 16-element 2.4 GHz dipole array
and the receiver chains. (b) The array front-end and the ROACH-2
FPGA based digital back-end. . . . . . . . . . . . . . . . . . . . . . 193
9.9 The outdoor measurement setup for experimental verification of the pro-
posed circular multibeam algorithm. . . . . . . . . . . . . . . . . . . 195
9.10 All 16 measured beams in single polar plot. . . . . . . . . . . . . . . . 196
9.11 Comparison of measured and simulated beam patterns corresponding to
beam 0 through beam 7. . . . . . . . . . . . . . . . . . . . . . . . . 197
9.12 Comparison of measured and simulated beam patterns corresponding to
beam 8 through beam 15. . . . . . . . . . . . . . . . . . . . . . . . 198
xix
CHAPTER 1
INTRODUCTION
Wireless technologies have become heavily integrated to modern human lifestyles.
The rapid development of wireless communication technologies have revolutionized
the way people connect, share and access information and therefore, high-speed
wireless connectivity today is an essential need. Moreover, the use of wireless tech-
nologies in the present day world extends beyond wireless communication systems
to electronic warfare [5], tomography imaging [6], radio astronomy [7] to many more
fields. All of these applications use different parts of the electromagnetic spectrum
depending on the particular requirements of each application. Directional reception
and/or transmission of electromagnetic waves which is also known as beamform-
ing [8] is one of the critical requirements for most of these applications. Beamform-
ing is considered a vital part of many wireless applications such as phased-array
radar [5], which is to electronically scan, sense, detect and track targets; radio as-
tronomy applications such as wide-field synthesis imaging [7], time-domain pulsar as-
trometry [9]; and more importantly in wireless communication for massive multiple-
input multiple-output (MIMO) [10,11] and emerging 5G technologies [12,13] as well
as for various other scientific activities such as study of ionosphere, weather, and
deep space communications. With the advent of 5G and beyond 5G wireless com-
munication networks, beamforming has drawn a great deal of attention. High gain
antenna arrays that can form multiple beams are required in order to utilize the
complex urban wireless channels that suffer from occlusions, path loss, and multi-
path effects [12, 14] at these frequencies. In fact, beamforming in large arrays and
the ability to form multiple simultaneous beams are key enabling technologies for
millimeter wave (mmW)/sub-THz/THz communication systems [15]. Further, with
the emerging mmW wireless technologies, applications relevant to defense systems
1
(a) (b)
(c) (d)
Figure 1.1: Applications of multibeam beamforming; (a) 5G wireless communication
environment; (b) electronic warfare; (c) satellite based Internet access; (d) radio
astronomy.
are flourishing, such as space-based mesh networks between low earth orbit satel-
lites, cross-platform high-capacity data connectivity (air, space, land, and sea) and
electronic warfare. As illustrated in Fig. 1.1, all these applications demand multi-
beam beamforming networks that can deliver a massive number of high bandwidth
beams. Recently, Defense Advanced Research Project Agency (DARPA) has also
called a separate program that aims to create multi-beam networks at 18-50 GHz
band to enhance secure communications between military platforms [16].
1.1 Motivation
Wireless communication has been one of the most vibrant and rapidly growing areas
in the field of communication. A high surge in research activities has occurred in
the past few decades in the area of wireless communication with advances in very-
large-scale integration (VLSI) technologies that has enabled ultra high-speed data
communication rates [17]. However, with the escalating growth in the use of smart-
2
phones and the demand for data services such as high-quality, low latency video
and multimedia applications in the last decade have created enormous challenges in
wireless systems due to the scarcity of bandwidth to cater to the ever increasing de-
mands of data rates. All the mobile generations up to fourth generation (4G) mobile
networks have been limited to carrier frequencies of sub-6 GHz range; thus, these
networks have been facing global shortage of bandwidth, which is needed to cater
the increasing demand of data rates due to the spectral crowding in the sub-6 GHz
spectrum. This situation has motivated the exploration of mmW frequency spec-
trum, which is fairly under utilized, for future 5G wireless communication networks.
As correctly pointed out in [12] back in 2013, mmW-enabled mobile communication
is now at the verge of becoming reality: the initial commercial deployments are
underway at the time of the writing of this dissertation.
Communication at mmW promises an unprecedented change in the wireless in-
dustry, where the wireless industry will move from today’s scarcity of spectrum to a
glut of spectrum. The spectrum above 6 GHz all the way up to sub-THz and THz
frequencies brings in massive amount of spectral resources that can be leveraged in
mobile communication towards an exponential increase in capacity and data rates
with compared to today’s systems. Channel capacity C as given by Shannon’s law
is expressed in (1.1).
C = log2(1 + SNR) bits/s/Hz. (1.1)
Dynamic spectral access (DSA), cognitive radio (CR) technologies, and full-duplex
systems can yield improved capacity, but they do not provide an exponential increase
in capacity. According to (1.1), the only method to achieve an exponential increase
in the capacity is to have an exponential increase in the system bandwidth. The
3
exploration of the huge amount of spectrum available above 6 GHz can accommodate
such increments [18–23].
The mmW communication channel encompasses a completely new radio propa-
gation environment compared to sub-6 GHz radios as of today and, thus, would need
completely redesigned transceiver front-end designs for the faster talking nodes. At
mmW, most objects encountered are much larger than its wavelength and thus the
mmW bands are dominated by occlusion effects from unpredictable obstructions
(vehicles, people, buildings, etc.) as opposed to diffraction effects that dominate
sub-6 GHz channels, as shown in Fig. 1.2(a). The other main challenge is the higher
free-space path loss (FSPL) experienced by the signals at the mmW frequencies.
As indicated by the Friis’s FSPL equation in (1.2), the received power Pr at a
propagation distance d is given by
Pr =
PtGt
4πd2
Aeff =
PtGtGr
L
(
λ
4πd2
)2
, (1.2)
where Pt is the transmitted power; Aeff is the effective antenna aperture and Gt, Gr
gain of the transmitting antenna and the receiving antenna, respectively; and where
L is a loss factor accounting for the efficiencies of the antennas which is greater
than one. As (1.2) implies, for the same distant d, the path loss increases with the
frequency since Aeff becomes smaller as the frequency increases.
On the other hand, the maximum gain of any antenna is related to its effective
area Aeff and the operating frequency as shown in [24]:
Gmax = Aeff
(
4π
λ2
)
. (1.3)
Because frequencies at the mmW region and beyond will lead to smaller wave-
lengths, the use of these frequencies would permit more antennas to be fitted into
a smaller form factor. If the maximum length dimension of the antenna at a fre-
quency corresponding to λ is D, then Gmax ∝ D2. Therefore, as implied by (1.3),
4
Base station
LOS is blocked
Reflected
mmW beam
Highly directional
mmW beams
(a) (b)
User 2 LOS
LOS for user 1
is blocked
mobile station
Multi-beam
Reflected
paths
Fog
Rain droplets
mmW multi-beam
antenna array
User 1
User 2
Base station
mmW multi-beam
antenna array
Figure 1.2: Examples of highly directional beam-like propagation in mmW wire-
less channels, which can suffer from obstructions in dynamically changing mobile
environments and how the use of multi-beam arrays can overcome such scenarios.
using N -element antenna arrays with larger N , the antenna gain can be increased
in the order of N2. The key here is to have more antenna elements in the array to
maintain a constant aperture size with the increase of frequency. This can be seen
as the only way to compensate for the high FSPL at these frequency. The situation
can be improved by using higher gain antenna arrays at both the transmitter and
the receiver which can in fact overcome the distance-dependent propagation path
loss. Therefore, exploiting high gain steerable antennas enabled by beamforming
technologies is a key aspect of future communication systems that use frequencies
mmW and beyond [17].
In addition, with the use of high gain antenna arrays that can produce sharp
beams, mmW channel can benefit from the environment to bounce energy off the
objects by using different beams to form communication links. Such a scenario is
illustrated in Fig. 1.2. Communication at these frequencies can also benefit from
simultaneous beams in different directions to utilize the channel to send multiple
streams in parallel to increase capacity [14, 17, 25]. Fig. 1.2 shows such a scenario,
where multiple simultaneous beams use spatial multiplexing to serve different users.
Thus, high gain multiple simultaneous beams will be the key in overcoming the 5G
mmW channels. Moreover, large multibeam systems are beneficial by the 5G base
5
stations to reduce the time in best beam pair search [26] between the base station
and the user. These aspects make a beamformer which can produce orthogonal
multibeams a great candidate for 5G systems [27]. For these reasons, a large portion
of work in this dissertation is directed towards reducing the associated hardware
complexity in N -element arrays that can produce N orthogonal simultaneous beams.
Multibeam beamforming has many other applications apart from 5G and mmW
systems in legacy sub-6 GHz arrays for MIMO communication. In [28] it has been
shown that the pre-beamforming matrix for large ULAs can be set by blocks of
columns of a unitary discrete Fourier transform (DFT) matrix for a joint spatial
division and multiplexing (JSDM) scenario, which is a method to achieve multiuser
MIMO downlinks by exploiting the structure of the channel correlations. The paper
shows that for ULAs, a DFT–based pre-beamforming matrix is near-optimal, requir-
ing only coarse information about the users’ angles of arrival and angular spread [28].
It is further shown that the DFT pre-beamforming achieves very good performance
in the absence of accurate estimation of the actual channel covariance matrix. The
above fact also justifies research efforts (described in Chapters 3 and 4) to enable
fully digital beamforming having orthogonal beams at much lower hardware com-
plexity. Multibeam radars are also used for scanning through different directions at
the same time and tracking multiple targets simultaneously [29, 30]. The radio as-
tronomy community also uses multibeam-based scanning in specific radio telescopes
with arrays of sensors [31, 32].
Although having a high number of simultaneous multibeams is advantageous,
linear arrays that employ N -sensor elements demand an O(N2) order of circuit
complexity to enable beamforming that preserves full degrees of freedom for the ar-
ray. For rectangular apertures, the circuit complexity of the beamforming networks
(that does not eliminate spatial degrees of freedom) grows as O(N3). Therefore,
6
the associated circuit complexity and power consumption are a limiting factor for
realizing such massive MIMO networks. The work summarized in this dissertation
address the associated huge complexity problem of such N -beam networks for larger
array sizes in terms of algorithmic and circuit-level novelties. The research efforts
are aimed towards exploring low-complexity implementations of multibeam arrays,
targeting both analog and digital beamforming networks. The proposed research
work also address the problem of achieving low-complexity wideband N -beam sys-
tems, eliminating the problem of beam squint and the resulting negative impacts on
the wideband analog beamforming systems.
1.2 Contributions of this Dissertation
Future 5G and beyond wireless systems which will operate on the carrier frequencies
ranging from mmW to THz frequencies will heavily depend on large beamforming
antenna arrays that can generate multiple simultaneous beams that are narrow and
supporting higher bandwidths. The aim of this dissertation is to explore algorithms
and circuits that will enable low-complexity realizations of future mmW and THz
sensor arrays. The specific algorithmic and implementation research contributions
towards reducing the complexity of multibeam array synthesis are described below.
1. Fully digital multibeam beamformers based on approximate DFTs have been
proposed for low-size-weight-power-and-cost (SWaPC) digital implementa-
tions. Narrowband orthogonal multibeams in theory can be achieved by em-
ploying a spatial DFT operation across a ULA antenna samples in receive-
mode. The approximate DFT algorithms of 8- and 16-point sizes that are
proposed for digital multibeam apertures are multiplierless and can be imple-
mented using adder only digital circuits. The algorithms have been further
7
factorized to reduce the number of adders through the use of butterfly stages.
The theoretical performance of the beams generated from the approximate
discrete Fourier transforms (ADFTs) have been quantified and all beams fall
exactly on top of the exact fast Fourier transform (FFT)’s beams unless there
is a small (less than 2 dB) hit in the sidelobe performance. The proposed multi-
beam algorithms have been implemented in digital. The hardware resource
utilization has been compared against the corresponding fixed-point FFT im-
plementations. A real-time 16-element antenna array setup that works at 2.4
GHz along with field-programmable gate array (FPGA) back-end for digital
signal processing (DSP) has been built to experimentally verify the beams.
The beams generated by the proposed low-complexity algorithms have been
measured in an anechoic chamber and have been compared with the exact
FFT-generated beams to verify that the ADFT-based multibeam beamform-
ers can be used in place of the FFT-based multi beamformers in multibeam
arrays.
2. A 32-beam low-complexity algorithm based on a 32-point ADFT has also
been proposed and experimentally verified. The 32-beam system has been
implemented using a 5.8 GHz 32-element antenna front-end with the aid of a
Reconfigurable Open Architecture Computing Hardware Rev. 2 (ROACH-2)
FPGA processing platform. The use of ADFT algorithms towards drastically
reducing the complexity of 2D multibeams arrays has also been investigated.
The proposed ADFT algorithms can be implemented without using multipliers
and therefore, 2D DFT based multibeam apertures can be replaced with the
ADFT digital cores to compute multibeams in digital implementations. An
analysis was conducted to obtain beam patterns arising from a 32× 32 array,
which will produce ≈ 1024 simultaneous beams using the 32-point ADFT as
8
a building block. The actual 2D RF beams that would be generated in such a
32×32 system have been synthesized in simulation using the measured beams
from the 1D 32-beam system.
3. A fully digital 4-beam beamformer using a 4-element array at 28 GHz has
been implemented. The digital multibeam beamforming is accomplished using
Xilinx radio-frequency system-on-chip (RFSoC) platform that can support 2
GSps sampling of 16 analog input channels. The digital beamforming supports
845 MHz of bandwidth and is performed in a polyphase manner. This is the
first 28 GHz based fully digital beamforming system that has been reported
that is capable of supporting a bandwidth over 800 MHz. Novel polyphase
circuits have been proposed for calibration of the front-end in a polyphase
sampling architecture.
4. The use of the same ADFT algorithms has been proposed for analog multi-
beam realizations. Since the proposed ADFT matrices and their sparse fac-
torization have small integer coefficients ({±2,±1, 0}), these matrices can be
easily mapped to current-mode circuits in analog CMOS to realize circuits
that approximates the DFT operation in continuous time. The matrix coef-
ficients can be realized using simple current mirror circuits; thus, the overall
circuit will have a much larger bandwidth compared to corresponding digital
realizations. Analog DFT-based CMOS circuits are proposed for receive-mode
and transmit-mode beamformers targeting high bandwidth operation. A novel
approach is proposed to utilize the ADFT factorization in implementing the
analog ADFT such that the overall current mirrors required for the entire cir-
cuit will be reduced to the order of N from N2, where N is the number of
elements in the antenna array, saving circuit area and static power. A CMOS
schematic level design has been designed in Cadence using 65-nm CMOS mod-
9
els and the circuits have been simulated and analyzed to verify the proposed
methodology. A comprehensive analysis of the total power consumption of
the proposed circuit was conducted and compared against the digital coun-
terpart implementation to verify that the proposed analog multibeam circuits
are power-efficient.
5. A novel methodology for obtaining wideband squint-free analog beams is pro-
posed based on the factorization of the delay Vandermonde matrix (DVM).
The proposed method reduces the number of TTD elements needed in analog
wideband multibeam networks. The proposed method has been verified us-
ing measured all-pass filter (APF) response for an s-band array. It has been
shown that for a 4-element array, the required number of APF blocks will
be reduced by 60%; for an 8-element array, the percentage saving would be
≈ 78%. The proposed method is extended to adapt for the wideband analog
IF/baseband beamforming case where the analog circuit complexity is reduced
by the same percentage as in the RF case. A new hybrid beamforming archi-
tecture is also proposed using the DVM as the first level analog beamforming.
A novel low-complexity wideband digital beamforming scheme based on Thi-
ran APFs is also proposed for level-2 digital beamforming. The Thiran APF
is an finite impulse response (IIR) filter that can generate the required TTD
with much lower complexity (multipliers) than finite impulse response (FIR)
implementations.
6. A novel low-complexity method for generating circular multi-beam that are
steerable has also been proposed. A conventional approach that produces
N narrowband beams in an N -element uniform circular array (UCA) would
need N2 order of multiplier complexity in the digital beamforming network.
The proposed method achieves equi-spaced circular beams at a complexity
10
of order N logN by exploiting the circulant structure of the beamforming
matrix. A new method for eliminating the mutual coupling in UCAs is also
proposed. The proposed method also exploits the circulant structure of the
coupling matrix of a UCA. It has also been shown that the elimination of
mutual coupling can be achieved simultaneously with the circular multibeam
generation without any increased complexity.
1.3 Publications
The research outcomes from this dissertation have been reported through five journal
publications and twelve conference papers, which are included in the vita at the end
of the dissertation.
1.4 Dissertation Outline
The remaining portion of the dissertation is organized as follows:
Chapter 2 presents a comprehensive review of multi-dimensional (MD) signal
processing in view of space-time filters, including the spectral properties of the
propagating electromagnetic (EM) plane-waves sampled in space. Section 2.1 re-
views the mathematical representations of the propagating EM plane-waves in the
3D space and Sections 2.2 and 2.3 review the spectral properties of spatio-temporal
(ST) plane waves (PWs) received by planar and linear arrays. The discussion in
Section 2.4 introduces how the spectral properties of the spatially sampled waves
are used to derive directional enhancement of signals, thereby leading to the discus-
sion of beamforming topologies and techniques in Section 2.5. Section 2.6 reviews
11
the basics of multibeam realization and provides an overview of the implementation
topologies.
Chapter 3 presents the work on producing low-complexity digital multibeams. A
review of fully digital beamforming and its benefits are discussed in Section 3.1. The
theory behind spatial DFT based multibeam generation is described in Section 3.2.
In Section 3.3, different implementation methods for discrete transforms that are
found in the literature are discussed. Section 3.4 describes a theoretical overview
of the proposed DFT approximations for digital multibeam beamforming. An 8-
point and a 16-point DFT approximation are introduced, targeting an 8-element
and a 16-element radio-frequency (RF) aperture to achieve multiple simultaneous
beams. The proposed ADFTs are compared to the exact DFT (ideal case) and are
explored to evaluate the performance for RF beamforming applications. Section 3.6
describes the digital hardware implementation and the microwave setup for realizing
the RF front-ends and antenna array for experimentally verifying the beams. In
Section 3.6.4, anechoic chamber beam measurements are presented and compared
to the beams from an ideal DFT.
Chapter 4 details the work on the multiplierless simultaneous 32-beam algorithm
and its digital implementation using a 32-element array receiver at 5.8 GHz to
realize a 1024-beam digital array. Section 4.3 presents the 32-beam low-complexity
digital algorithm and its sparse factorization towards low-complexity multiplierless
implementation of a 32-beam digital core. In Section 4.4 the details of the 5.8
GHz 32-element digital array receiver is given. Section 4.5 presents the 32-beam
real-time beam measurements for the proposed algorithm and the fixed-point FFT
counterpart. The Section also presents the synthesized 2D beams from the 32-beam
linear array beam measurements at 5.8 GHz that would correspond to an equivalent
2D array made of similar 32, 32-element subarrays.
12
Chapter 5 describes work on a 4-element digital beamforming array at 28 GHz
that supports 800 MHz of baseband bandwidth. Section 5.1 describes the details of
the 28 GHz receiver array design starting from the antenna to the individual receiver
chain. Section 5.2 describes the Xilinx RFSoC based high speed high bandwidth
digital processing back-end. The Section also provides the details of the polyphase
circuit architectures for beamforming the entire 800 MHz bandwidth. Section 5.3
details the 28 GHz real-time beam measuring setup and presents the experimental
procedure along with the measured beam responses.
Chapter 6 discusses a novel method for realizing analog multibeam networks
using the ADFT algorithms proposed in Chapters 3 and 4. Sections 6.1 and 6.2 dis-
cuss the RF system considerations and the need for analog multibeam beamforming
front-ends in future mmW transceivers, emphasizing the need for low-power, energy-
efficient multibeam solutions for mobile devices. Section 6.3 proposes the use of the
sparse factorization stages of the ADFT algorithms to map them into analog current
mirror–based CMOS circuits to low-complexity energy efficient multibeam circuit
implementations. The Section also presents the schematic based CMOS (using the
Taiwan Semiconductor Manufacturing Company (TSMC) 65 nm process design kit
(PDK)) designs proposed for realizing the 16-beam circuit that uses the 16-point
ADFT algorithm. The simulated beam responses from the Cadence simulation are
given in the same Section. Section 6.4 conducts a comparison of the metrics of the
analog implementation and the corresponding digital implementation to prove that
the proposed analog circuits are energy-efficient and feasible for future implementa-
tions.
In Chapter 7, a novel squint-free wideband beamforming architecture is pro-
posed. The Sections 7.1 and 7.2 discuss the problem of beam squint in wideband
beamforming and briefly describes how the conventional Butler matrix type analog
13
multibeam networks suffer from beam squint. In Section 7.3, an analog multibeam
beamforming network model that does not suffer from beamforming is introduced,
leading to the novel N -beam wideband beamforming algorithm described in Section
7.4. As a proof of concept of the proposed algorithm, Sections 7.5 and 7.6 present
the circuit architectures and simulation verification of the proposed algorithm for a
4-beam network. Section 7.7 extends the algorithms in Section 7.4 for implementa-
tions in IF and Section 7.8 provides simulated verifications for proposed concepts.
The novel hybrid beamforming architecture is proposed for squint-free wideband
beamforming in Chapter 8 using the analog algorithm proposed in Chapter 7 tar-
geting mmW systems. The overall architecture of the proposed hybrid beamformer
is discussed in Section 8.2. Section 8.3 discusses the level-1 beamforming simu-
lated responses of the beamformer using a 28 GHz APF circuit response. Section
8.4 presents the level-2 wideband digital beamforming architecture proposed using
low-complexity Thiran APFs along with the corresponding simulated hybrid array
factors for the proposed architecture.
A new algorithm for generating circular symmetric multibeams using a circu-
lar geometrical array is proposed in Chapter 9. A brief review of beam synthesis
in circular arrays is provided in Section 9.1. The proposed N -beam circular array
processing system is then discussed in Section 9.2. The digital hardware realization
architectures are also presented for realizing the proposed algorithm in Section 9.3,
and the simulated verifications are given in Section 9.4. Moreover, a novel method
is proposed to eliminate of the mutual coupling effect on the receive mode in Sec-
tion 9.6, and it is further shown how the mutual coupling and uncoupling can be
simultaneously achieved in the multibeam realization without any added complexity.
Finally, Chapter 10 summarizes all the research work carried out in this disser-
tation with the insights that can be gleaned from the conducted research.
14
1.5 Scientific Collaborators
The work performed in the research study of this dissertation was carried out with
multiple collaborations. The work described in Chapters 3, 4 and 6 were mainly
conducted in collaboration with Dr. Renato J. Cintra at Federal University of
Pernambuco (UPFE) in Brazil and his student, Diego Coelho at the University
of Calgary for the approximate matrix search part. Dr. Sirani Perera at Embry
Riddle Aeronautical University collaborated for the work described in Chapters
7 and 8 by deriving the mathematical proofs for the proposed algorithms. Dr.
Soumyajit Mandal from Case Western Reserve University Cleveland collaborated
in several publications that were done around the work presented in Chapters 4,
6 and 7. Further, Dr. Leonid Belostotski from the University of Calgary, Canada
collaborated in the work done related to analog beamformers in Chapters 6 and 7.
Dr. Ted S. Rappaport at NYU WIRELESS, New York University also collaborated
in several publications that were resulted in the works associated with Chapters 6,
4, and 5.
15
CHAPTER 2
REVIEW OF RF BEAMFORMING THEORY AND TECHNIQUES
This chapter presents a review the multi-dimensional perspective of propagat-
ing planar waves and their spectra as seen by the planar or linear antenna arrays
in beamforming receivers. The concepts presented here will lay the platform for
the discussion leading to beamforming and the generation of multiple simultaneous
beams, which is the focus of this dissertation. Although the discussion is presented
in the perspective of the receiver, the concepts and theories will be reciprocal to the
transmit side as well.
2.1 Free Space Propagation of EM Waves
As illustrated in Fig. 2.1, the propagating electromagnetic waves can be considered
to be approximately planar in the far-field. Far-field is generally considered to be
at a distant d from the transmitter, where d > 2D2/λ, where λ is the wavelength of
the wave in the propagating medium, and where D is the aperture of the radiating
antenna [33, p. 42]. Such a propagating wave is a four-dimensional (4D) spatio-
temporal or a multi-dimensional signal that has a temporal dependence with one or
more spatial dimensions over a finite region.
The transverse electric field or magnetic field of such far-field EM PWs in three-
dimensional (3D) space {x, y, z} ∈ R3 can be represented in continuous domain as
a 4D function in the form,
pw4D(x, y, z, t) = spw(ct+ dxx+ dyy + dzz), (2.1)
where d̂ = [dx dy dz]
⊤ is the unit vector that denotes the DoA of the signal in
3D space, c is the wave propagation speed, t ∈ R is the time, and spw(λ), ∀ λ =
ct + dxx + dyy + dzz ∈ R is the function that defines the one-dimensional (1D)
16
x
z
y
Source
Receiver
s (t) pw (x, y, z, ct) = s (ct+ dxx+ dy + dzz)
φ
A ≡ (x0, y0, z0)
θ
Figure 2.1: Signals emanated from a source can be treated as plane waves in the
far-field.
x
y
λ
s
(λ
)
=
J
0
(λ
)
θ
Figure 2.2: 2D temporal snapshot of the 3D plane wave function created using 1D
to 3D mapping.
temporal signal (here, λ is merely a parameter and is not related to the wavelength
of the signal). For each value of λ, s(λ) corresponds to a 4D iso-surface in {x, y, z, ct}.
Since vector d̂ denotes a unit vector, d2x + d
2
y + d
2
z = 1. The unit vector d̂ can be
expressed in terms of the elevation θ ∈ [0, π] and azimuthal angle φ ∈ [0, 2π] as
d̂ ≡ [dx dy dz]⊤ = [sin(θ) cos(φ) sin(θ) sin(φ) cos(θ)]⊤. (2.2)
For convenience, this discussion is limited to waves that represent the electric field or
the magnetic field of the propagating waves by (2.1) on planar surfaces and straight
lines.
17
For example, a waveform s(λ) can be mapped as a 3D function using the pa-
rameterization λ = g(x, y, ct), where s(λ) is a 1D function, and g(·) defines the
parametric relationship of {x, y, ct}. Fig. 2.2 shows a snapshot of a simulated plane
wave in x–y 2D plane, propagating at an angle of φ to the x-axis, obtained using
the parametric mapping λ = x cos φ + y sin φ + ct in the 1D function J0(λ). The
function J0(·) here is the Bessel function of the first kind, which has been arbitrarily
chosen for the illustration. The above mapping is the 2D version of (2.1) that can
be obtained by setting θ = π/2 in (2.2).
The ST PW signals received in a planar region can be expressed as in (2.3) by
setting z = 0 in (2.1) without loss of generality.
pw3D(x, y, t) = spw(dxx+ dyy + ct). (2.3)
Therefore, the 4D hyper planar iso-surfaces of constant λ in expression (2.1) simplify
to 3D iso-planes for the case in (2.3) where the orientation of such iso-planes with re-
spect to the ct axis will be given by the relationship x sin θ cosφ + y sin θ sin φ + ct =
λ. In the 2D case, for which the signals are received in the x-axis (without loss of
generality), constant λ surfaces become 2D iso-lines in {x, ct} ∈ R2 as given in
(2.4).
pw2D(x, ct) = spw(x sin θ cosφ + ct). (2.4)
If the 2D ST PW is viewed in the x–y plane (making θ = π/2), then the above
relationship can be expressed as pw2D(x, ct) = spw(x cosφ + ct). For convenience,
if the spatial DoA the ST PW is defined with respect to the y-axis as shown in
Fig. 2.3(a), then the relationship in (2.4) can be rewritten w.r.t ψ as pw2D(x, ct) =
spw(−x sinψ + ct). Thus, now the apparent DoA in space-time becomes ϑ, where
ϑ is given by (2.5).
ϑ = tan−1(sinψ). (2.5)
18
sinψ = tan θ
y
ψ
x
(a)
Plane wave spw (x, ct)
spatial DOA
x− y domain
future
past
45◦ edge of
the light cone
x
ϑ space-time DOA
Uniform linear
array
ct
(b) Plane wave spw (x, ct)
Figure 2.3: (a) Plane wave received on the x-axis; (b) the 2D spatio-temporal DoA
in the {x, ct} domain.
As illustrated in Fig. 2.3(b), the angle ϑ is constrained to [−π/4, π/4] in the x− ct
domain where |ϑ| is maximum when ψ = ±π/2 (i.e., φ = 0, π and θ = π/2).
2.2 Spectral Properties of 2D/3D Spatio-Temporal PWs
Since the focus of this dissertation is limited to planar and linear arrays, this section
will describe the analysis of the spectral properties of 2D/3D PW signals. The
analysis will first be conducted on the continuous-domain (CD) case. In practical
array processing and multi-dimensional signal processing applications, the signals
encountered are spatially discretized signals where the temporal domain can be
either continuous or discrete. This is because the spatial sampling of the signals is
achieved using antennas that have a finite aperture size.
The general spectral properties of a 4D wave function can be analyzed by taking
the 4D continuous-domain Fourier transform (FT) (CDFT) of (2.1). Equation (2.6)
defines the 4D CDFT for the generalized 4D plane wave function
PW4D(Ωx,Ωy,Ωz,Ωct) ,
+∞∫ ∫ ∫ ∫
x,y,z,ct=−∞
pw4D(x, y, z, ct)e
−j(Ωxx+Ωyy+Ωzz+Ωctct)dxdydzdct,
(2.6)
19
y
x
y
x x
y
0 < ψ < 90
ψ
ψ
ψ = 0 ψ = 90
Figure 2.4: Different spatial frequencies observed by x-axis for a sinusoidal wave
front impinging from different directions.
where (Ωx,Ωy,Ωz,Ωct) ∈ R4 and Ωk = 2πfk, k ∈ {x, y, z, ct} are the angular fre-
quencies with respect to each variable, and fct = ft/c (here, ft denotes the temporal
frequency of the signal). Fig. 2.4 provides an illustration to visualize the concept of
spatial frequency and its relevance in beamforming. A 2D sinusoidal ST PW that
is being received on the x-axis with different DoAs is depicted in Fig. 2.4. It can
be seen that different DoAs results in different spatial frequencies to appear at the
x-axis, giving rise to different Ωx values. Therefore, the spectra of an impinging ST
PW will be localized to a certain region in the MD frequency domain, depending
on the directionality of the impinging ST PW. This is the basic phenomenon that
is exploited in beamforming, which can be thought of as spatial filtering.
A detailed evaluation of the (2.6) can be found in [34] and, as shown in [34],
the 4D CDFT of pw4D(x, y, z, ct), PW4D(Ωx,Ωy,Ωz,Ωct) can be simplified to the
expression given in (2.7),
PW4D(Ωx,Ωy,Ωz,Ωct) = SpwcΩct · δ(dxΩct − Ωx) · δ(dyΩct − Ωy) · δ(dzΩct − Ωz),
(2.7)
where Spw(Ωt) is the 1D continuous-time (CT) FT of the temporal signal spw(t); i.e.
spw(t)
F↔ Spw(Ωt). The function δ(·) denotes the CD impulse function. According
to (2.7), it can be seen that the region of support (RoS) of the spectrum of the PW
20
Light conetransform
ROS
Light cone
3D Fourier
transform
2D Fourier
(b)
(a)
Planar wave front
ϑ ϑ
ϑ
45◦
Line shaped
spectral ROS
Li
gh
t
co
ne
fro
nt
ϑ
s pw
(x
, c
t)
ϑ = tan−1 (sinψ)
Ωct
Ωx
x
ct
Ωct
Ωy
ct
y
x
ϑ = tan−1 (sin θ)
pla
na
r w
av
e f
ro
nt
Ωx
Figure 2.5: (a) The RoS of a 3D ST PW received by a planar surface in 3D space
(z = 0) and its corresponding RoS in the 3D frequency domain; (b) the RoS of a
2D ST PW received by a line in 3D space (z = 0, y = 0).
is confined to a 4D hyper-line generated by the intersection of three 4D hyper-planes
described by dxΩct − Ωx = 0, dyΩct − Ωy = 0, and dzΩct − Ωz = 0.
For the specific cases that is of interest in this chapter, the 3D and 2D Fourier
domain functions PW3D and PW2D for signals received on a plane and a line in the
3D space are given by (2.8) and (2.9), respectively. Equation (2.7) indicates the
spectral analysis of the ST PW of interest and thus allows to design filters to filter
out directional signals.
PW3D(Ωx,Ωy,Ωct) = SpwcΩct · δ(dxΩct − Ωx) · δ(dyΩct − Ωy). (2.8)
PW2D(Ωx,Ωy,Ωct) = SpwcΩct · δ(dxΩct − Ωx). (2.9)
As described earlier, the RoSs of the spectrum in (2.8) and (2.9) are also confined to
a line shaped region in the 3D and 2D frequency domains, respectively. Fig. 2.5(a)
21
x
ct
Ωx
Ωct
2D CDFT
Figure 2.6: Illustration of 2D sinusoidal wave form and its corresponding 2D fre-
quency spectrum.
and Fig. 2.5(b) depict the RoSs of the frequency domain spectrums for the 3D and
2D cases, respectively.
Fig. 2.6 shows the 2D frequency domain plot of a 2D ST PW having a sinusoidal
wave front arriving at the x-axis at a DoA of ψ off broadside from the y-axis to the
counterclockwise direction. It should be noted that for such sinusoidal wave front,
the corresponding magnitude of the 2D frequency response should ideally contain
two impulses on the line − sinψ Ωct − Ωx = 0 (for the 1D case according to (2.9),
dx = − sinψ, where ψ is defined as above). It is noted that frequency domain plot
in Fig. 2.6 suffers from spectral leakage of a rectangular windowing function applied
on top of the 2D signal.
2.3 3D/2D ST PW Signals Received by Arrays of Antennas
Beamforming or spatial filtering of signals necessitates sensing of spatial variations of
signals. EM waves are captured or sensed by EM antennas having a finite dimension
that are designed to resonate at a particular RF bandwidth. Therefore, sensing ST
PW signals in CD spatially is not possible. As a matter of fact, antenna arrays
are used for beamforming or spatial filtering applications. Fig. 2.7 shows examples
22
(b) (c)
(e)(d)
(a)
Figure 2.7: (a) 2D antenna array on a fighter jet nose; (b) circular antenna array
inside a Wi-Fi access-point; (c) 64-element 28 GHz array developed by IBM and
Ericsson [1]; (d) The National Radio Astronomy Observatory’s Very Large Array
in New Mexico [2] (e) Precision Acquisition Vehicle Entry Phased Array Warning
System consisting of a crossed dipole element antenna array located at US Clear Air
Force Base, Alaska.
of antenna arrays used in different applications. Different geometries of antenna
arrays can be used for spatially sampling signals. Uniform rectangular and linear
arrays are the most widely used geometries [35,36] and several other sampling grids
like circular, hexagonal are also used [37]. Non-uniform geometries are also used in
spatial sampling [38, 39]. The discussion in this section will be limited to sampling
of 3D and 2D signals with uniform planar and linear arrays, respectively. Uniform
circular array sampling of 2D ST PWs is discussed in Chapter 9.
A similar frequency domain analysis that was done with the CD FT can be
conducted for the spatially sampled 3D PWs received by the URAs. Let ∆x and
∆y be the inter antenna element spacing of the planar rectangular antenna array.
23
If we consider an 3D ST PW given by pw3D(x, y, ct), then the spatially sampled
ideal signal will be given by pw3D(nx∆x, ny∆y, ct) where nx, ny ∈ Z. Typically,
mutual coupling is present in an actual antenna array and that effect will change
the spatially sampled signals to deviate from the ideal expected sampled signal.
There are different approaches for mitigating mutual coupling. Typically dummy-
elements are added in the edges of rectangular and linear arrays to equalize the
mutual coupling effect across the elements. Since mutual coupling and it’s mitigation
are much broader topics, a mutual coupling free antenna array will be assumed for
the subsequent discussion. Another concern to notice in practical antennas is that
the antennas have a finite pattern which will be directional and not omni. Therefore,
the antenna patter depends on the DoA. Also, the finite antenna pattern will have
a gain and phase response that is a function of the frequency of operation. Thus
antenna pattern can be expressed as function Υ (Ωx,Ωy,Ωct). For simplicity of the
subsequent analysis, antennas will be considered omni-directional and to have flat
temporal frequency response across the bandwidth of operation.
To analyze the spectral content of a sampled MD signal, first an infinite spatial
sampling grid will be assumed. The 3D mixed domain output of such a sampling
grid where spatial variables are discrete and the temporal variable is continuous can
be denoted as a3D,m(nx, ny, ct). Then the 3D mixed domain FT can be defined for
the Fourier transform pair, a3D(nx, ny, ct)
F↔ A3D,M(ωx, ωy,Ωct) as given in (2.10),
A3D,M(ωx, ωy,Ωct) =
∫ t=+∞
t=−∞
+∞∑
nx=−∞
+∞∑
ny=−∞
a3D,M(nx, ny, ct)e
−jωxnxe−jωynye−jΩctctdct,
(2.10)
24
ωx
ωy
(0, 0)
(0, π) (π, π)
(−π, 0)
(−π, π)
(π, 0)
(−π, π)(−π,−π) (0,−π)
ω′max
Figure 2.8: The top-view of the ROS of a 3D ST PW signals.
where, ωx = Ωx∆x, ωy = Ωy∆y. Following [35][chap. 1], (2.10) can be expressed as
in (2.11),
A3D,M(ωx, ωy,Ωct) =
1
∆x∆y
+∞∑
mx=−∞
+∞∑
my=−∞
PW3D
(
ωx − 2πmx
∆x
,
ωy − 2πmy
∆y
,Ωct
)
,
(2.11)
where mx, my ∈ Z. The function PW3D (·) is the 3D-CDFT of the 3D CD received
PW function. It is also noted that the spatially sampled signal now has an infinitely
repeating spectrum pattern with a periodicity of 2π along both ωx, ωy.
Fig. 2.8 shows a diagram containing the repetition pattern of the spectrum in
the ωx, ωy plane looking from the Ωct axis. For this illustration ∆x = ∆y = ∆ is
assumed. As shown in the figure, if ω′max ≥ π, then spatial aliasing of signals occur
in the spatial-frequency space. The condition for avoiding spatial aliasing is given
by,
ω′ = Ωk∆ ≤ π, (2.12)
where k ∈ [x, y]. For band limited signals where Ωct ≤ c−1Ωt,max, Ωk,max =
c−1Ωt,max tanϑmax. Since tanϑ = sin θ, Ωk,max ·∆ ≤ π implies c−1Ωt,max · sin θmax ·
25
(b) (c)(a)
ωx
ωct
ωx
ωct
Frequency
independent
desired angle
Beam-shaped
ωx
passband
Required passband
of narrowband signals
MD filtering MD filtering
of wideband signals
θd
θd
ωct
Spectral ROS of
θd
ωc
(carrier frequency)
desired wave from DOA ψd
Light cone
Figure 2.9: Cross section through ωy = 0 plane in the space-time frequency domain.
∆ ≤ π and thus the sampling criterion or the inter element spacing for avoiding
spatial aliasing is given by (2.13),
∆ ≤ c/(2 · ft,max · sin(θmax)), (2.13)
where θmax is the maximum DoA of the signal in the elevation plane. The same
concept is valid when the signals are time-discrete. For such cases, the sampled
signal spectrum will also repeat in the ωt (or ωct) axis where ωt = Ωt∆T . ∆T here
is the sampling period where temporal sampling frequency, fs = 1/∆T .
2.4 Spatial Filtering of ST PWs viz. Beamforming
Selective enhancement of received plane waves along a particular DoA requires a
multi-dimensional space-time filter having a passband aligned with the line-shaped
spectrum oriented at the corresponding angle in the MD frequency domain. The
diagrams shown in Fig. 2.9 illustrate cross sectional view of the RoS of a 3D ST
PW along ωy = 0 and the figure depicts different passbands that can be employed
to filter temporal signals having different bandwidths. Fig. 2.9(c) shows the ideal
26
passband of a wideband MD filter that should be employed for beamforming a
wideband planar signal. Such a filter can be realized either in analog or in digital.
Digital implementation would require going from direct RF to bits and therefore,
the RF frequencies have to be relatively small when compared to ADC sampling
rates of the digital back-end. Fig. 2.9(b) shows the passband of a MD filter that
would filter a narrowband PW signal. This kind of passband can be generated either
in analog or digital processing (again digital processing require RF to bits). Most
phaseshifting based narrowband analog phased arrays produce such passbands in
directional enhancement of waves. Beamforming that is achieved with such filtering
that has a passband as shown in Fig. 2.9(b) is known to suffer from beam squinting
(this phenomenon is discussed in Chapter 7.1). The effect of beam squinting is small
for highly narrowband systems.
Fig. 2.9 specifically illustrates beamforming passbands at RF in MD frequency
domain. Beamforming can be achieved in IF or baseband as well. In most commu-
nication systems, the RF is first downconverted to a low-IF or to baseband before
digital processing. The Fig. 2.9 shows an illustration of the 2D spectral transforma-
tion of a 2D ST bandpass PW originally at RF to baseband. Fig. 2.10(a) emulates
the spectrum of a signal that is at RF impinging or an ULAs at an angle of 60◦.
The Fig. 2.10(b) shows the temporally in-phase (I) quadrature (Q) downconverted
2D spectrum, which corresponds to the mathematical operation in (2.14) that obeys
the modulation property of the MD Fourier Transform.
s(x, ct)
F−→ S(ωx, ωct),
s(x, ct)e−jω0ct
F−→ S(ωx, ωct + ω0). (2.14)
Fig. 2.10(c) shows the ideal passband required for filtering such directional narrow-
27
Ω
c
t
ωx ωx ωx
ω
c
t
−π,−Ωct,max
Narrowband signal spetra Downsampled signal spetra
−π,−π
−π, π
ideal passband
passband
approximated
−π,Ωct,max
−π,−Ωct,max
IQ downconverted signal spetra
π,Ωct,max π,Ωct,max π, π
π,−π
Ω
c
t
π,−Ωct,max π,−Ωct,max
−π,Ωct,max
Figure 2.10: Frequency spectra of the 2D ST BP signal which is received by a
ULA; (b) the downconverted spectrum; (c) spatial filtering to the received multi-
dimensional signal.
band signals. This kind of passband is analogous to that is shown in Fig. 2.9(b)
other than the fact that the signal is now at baseband than RF.
2.5 Different Spatial Filtering Approaches for ST Array
Processing
This section will provide a brief overview on different beamforming implementations
and their mathematical formulations. Although the 1D ULA configurations are used
for the discussion to follow for the convenience, the mathematical formulations and
the beamforming approaches can be extended to higher dimensional arrays without
loss of generality. Beamforming algorithms, circuits and architectures can be catego-
rized differently. Different categorizations exist based on hardware implementation
approach, bandwidth of operation, pattern synthesis techniques and depending on
the beams realized being fixed or adaptive. An abstract level categorization of the
receive-mode beamforming topologies based on the hardware implementation ap-
proaches is shown in Fig. 2.11. An example of traditional phased array topology us-
28
A
D
C
A
D
C
A
D
C
A
D
C
A
D
C
A
D
C
A
D
C
A
D
C
A
D
C
A
D
C
A
D
C
A
D
C
R
F
C
ha
in
R
F
C
ha
in
R
F
C
ha
in
R
F
C
ha
in
R
F
C
ha
in
R
F
C
ha
in
N
M
Digital Beamforming
Shifters
Phase
N
Digital Beamforming
N
Hybrid BeamformingAnalog Beamforming Digital Beamforming (c)(b)(a)
Figure 2.11: Beamforming topologies based on the hardware implementation; (a)
analog beamforming, (b) digital beamforming, and (c) hybrid beamforming archi-
tectures.
ing analog electronics at RF (or IF) for phase manipulation is shown in Fig. 2.11(a).
Digital multi-beam beamforming provides maximum flexibility, reliability, reconfig-
urability and maximal degrees of freedom for beam combination over traditional
passive/active analog phased-array realizations. A generic hardware configuration
of a digital beamforming setup is shown in Fig. 2.11(b). However, DBF requires
one RF chain and two analog to digital converter (ADC)s per antenna element
(assuming I Q downconversion), i.e., P RF chains and 2P ADCs for P antenna
elements. This results in high power consumption, particularly with the utilization
of large arrays that employ high number of ADCs (supporting large bandwidths,
for e.g in mmW), which are usually the most power-hungry blocks [40]. In com-
parison, analog beamformers have the lowest power consumption among all topolo-
gies. Hybrid-beamforming addresses this challenge by combining low-dimensional
digital beamformers (at baseband) with analog beamformers (at RF) [41]. Such
architectures can achieve performance similar to fully-digital schemes at lower cost
and power. They typically use RF phase-shifters, TTDs, or lenses for level-1 ana-
29
y(t)
x0(t)
α0
x1(t) xN−1(t)
α1 αN−1
ψ τ =
∆x sinψ
c∆x
Figure 2.12: Receive mode model of an N -element phase array.
log beamforming [41–43] and baseband digital processing for level-2 beamforming.
Having introduced realization methods of beamformers in hardware, the following
section will review the basics of beamformer implementations with fixed weight-set.
Consider an N element uniform linear antenna with Nyquist element spacing
∆x as shown in Fig. 2.12 where ∆x = λmin/2 and λmin corresponds to the highest
frequency of interest. Let x ∈ CN×1 be the time continuous output signal vector
from the array as given in (2.15) where xnx(t) = apw(nx, t) is the signal at nxth
spatial location and apw(nx, t) is the 2D signal from the array.
x = [x0(t), x1(t), . . . , xN−1(t)]
⊤ (2.15)
Also, let Apw,m(nx, jΩt) be the mixed domain transfer representation (by taking the
Fourier transform along time) of apw(nx, t). The Apw,m(nx, jΩt), the mixed domain
output from the array can be denoted as a vector apw,m as shown in (2.16).
apw,m = [Apw,m(0, jΩt), Apw,m(1, jΩt), . . . , Apw,m(N − 1, jΩt)]⊤ (2.16)
If (2.17) denotes the weighting vector applied to the array signal x where αi ∈
C, 0 ≤ i ≤ N − 1, then the z-transform of such a spatial discrete ST PW filter is
given by (2.18).
w = [α0, α1, . . . , αN−1]
⊤ , (2.17)
30
H(zx) =
k=N−1∑
k=0
αkz
−k
x (2.18)
The frequency response of the spatial filter can be therefore expressed as in (2.19)
by replacing zx = e
jωx where ωx = ∆xΩct sinψ.
H(ejωx) =
k=N−1∑
k=0
αke
−jωxk (2.19)
The output response Ym(jΩt) to a 2D input signal wpw(nx, t) can be expressed in
the Fourier domain as given in (2.20) where Ym(jΩt) is the Fourier transform of
y(t): the beam output.
Ym(jΩt) = w · Z · apw,m, (2.20)
Here, Z = (ζi,j) with ζi,i = e
jωx·i, ∀i, j ∈ {1, 2, . . . , N} is a N×N diagonal matrix.
For cases where αi’s inw are narrowband coefficients, then the coefficients essentially
become complex constants and for such cases the beamformed time domain output
y(t) simplifies to the operation in (2.21),
y = w⊤x, (2.21)
The produced beam pattern in the far field is a function of the weights αis and
the complex beam pattern corresponding to w is related to the discrete Fourier
transform of the spatial weighting vector w [44].
Setting αi = e
−jΩt(iτ) where, τ = ∆x sinψ
c
or making w as shown in (2.22)
produces a broadband beam in the direction ψ.
w =
[
1, e−jΩτ , . . . , e−jΩ(N−1)τ
]⊤
, (2.22)
c is the wave propagation speed and Ωt = 2πft where ft is the temporal frequency
variable (ft ∈ [fc − B, fc +B] and B is the baseband bandwidth). The term e−jΩτ
is a complex frequency dependent weighting that realizes a TTD across the signal
31
y(t)
ψ
D
e
la
y
li
n
e
-1
D
e
la
y
li
n
e
-2
x1(t) xN−1(t)
τ =
∆x sinψ
c
x0(t)
x1(t − (N − 2)τ)
x0(t − (N − 1)τ)
∆x
Figure 2.13: True-time delay-and-sum beamforming in a receive-mode N -element
phase array.
bandwidth and such a weight set will ideally produce a wideband squint-free beam.
Fig. 2.13 shows the overview architecture of such TTD based beamformer. In analog
RF, this type of architecture can be realized by employing progressive transmission
line delays [45], sensor delay lines [46], tunable spiral inductors [47] or using analog
electronics that realizes APF responses to accurately synthesis the delays.
Wideband beamforming can be achieved in digital by employing digital filters
that synthesizes the required wideband passband. Different digital filter architec-
tures can be identified in the literature for achieving wideband beamforming. Most
straight forward way of doing this is to implement the same architecture shown in
Fig. 2.13 in digital using a higher order FIR interpolation or fractional delay filters
as illustrated in Fig. 2.14(a) to approximate the required true-time delay at each
antenna’s signal path. Note that the xi[n], i ∈ [0, N − 1] inputs to the array pro-
cessor coming from the ith antenna is a complex signal; the signals present at the
antennas are sampled in the spatial domain, amplified and filtered, down-converted
to baseband and finally digitized by an ADC (or down-converted to an IF and then
32
(a) (b)
F
IR
 fi
lte
r−
$N
$
F
IR
 fi
lte
r−
1
F
IR
 fi
lte
r−
2
F
F
T
F
F
T
F
F
T
IFFT
kth bin
β0,k β1,k βN−1,k
x0[n] x1[n] xN−1[n]
βn,k = e
(jωcttd(n))
y[n]
x0[n] x1[n] xN−1[n]
y(t)
Figure 2.14: (a) Digital filter-and-sum beamforming architecture; (b) FFT-based
frequency domain wideband beamforming in digital.
converted to baseband in digital). The process is achieved through mixing and can
be modeled as multiplication by ej2πfct and leaves the spatial frequency components
intact. Since the inputs to the filters are complex, two copies of the same FIR
factional/interpolation filter realizing the required delay at the ith filter is needed.
Another method is to achieve the same idea in the frequency domain as il-
lustrated in Fig. 2.14(b). This realizes the required TTD by utilizing frequency
depended weightings. This is achieved by employing a FFT in each received path
and decomposing the input signal to a set of narrowband outputs where a set of
complex coefficients can be applied at each frequency bin of the signals for realizing
the required delay in the frequency domain at each element. The phase aligned
relatively narrowband outputs then are summed for beamforming. Each output
corresponding to each frequency bin can then be converted to time domain using an
IFFT operation.
33
Complex Adder
Complex Multiplier
Complex
Network
Adder
z−1 z−1
z−1 z−1 z−1
Im
{
h∗
(N-1,M-1)
}
Re
{
h∗(N-1,M-1)
}
z−1
Re
{
h∗(N-1,2)
}
Im
{
h∗(N-1,2)
}
Re
{
h∗
(N-1,0)
}
Re
{
h∗(N-1,1)
}
Im
{
h∗(N-1,0)
}
Im
{
h∗(N-1,1)
}
Re
{
h∗
(0,0)
}
Im
{
h∗(0,0)
}
Im
{
h∗(0,1)
}
Re
{
h∗
(0,1)
}
Im
{
h∗(0,2)
}
Re
{
h∗
(0,2)
}
Re
{
h∗
(0,M-1)
}
Im
{
h∗
(0,M-1)
}
Re{x0[n]}
Im{x0[n]}
Re{xN−1[n]}
Im{xN−1[n]}
z−1 z−1
z−1 z−1z−1
z−1
∑
Figure 2.15: Digital architecture of a Frost beamformer.
Different other digital beamforming techniques based on 2D FIR also exist; but
implementation wise all of them can be broadly categorized under Frost beamformers
[48] which is similar to the filter-and-sum architecture shown in Fig. 2.14(a). This
structure is widely used to realize different adaptive/non-adaptive 2D/3D passbands
in digital [49–51]. Such filter responses at baseband would be complex valued and
thus would require complex multiplier based FIR structure in implementations as
shown in Fig. 2.15. 2D/3D network resonance based beamformers are also known
for their low-complexity implementations [52]. These filters are realized as 2D/3D
IIR filter structures. The IIR nature in the filters achieves the beamforming at lower
hardware complexity [53].
For narrowband systems where temporal frequency spread around fc is small,
Ω ≈ 2πfc and the term e−j2πfcτ becomes a complex constant βk. Therefore, a
narrowband beam can be realized by replacing Ω with 2πfc that gives rise to w in
(2.23)
w =
[
1, e−j2πfc
∆x sinψ
c , . . . , e−j2πfc(N−1)
∆x sinψ
c
]⊤
. (2.23)
34
2.6 Formation of Multiple Simultaneous Beams
The equation in (2.20) describes the multi-input single-output system for producing
a single beam. Producing multibeams involves realizing more than one complex
weighting vectors. The underline function of realizing p simultaneous beams involves
implementing the linear system given in (2.24),
ym = Wp · Z · apw,m, (2.24)
where, ym is the Fourier domain vector of p RF beams with ym =
[ym(0, jΩt), ym(1, jΩt), . . . , ym(p− 1, jΩt)]⊤ and Wp ∈ Cp×N being the p × N
matrix containing p beamforming vectors as given in (2.25).
Wp = [w1,w2, . . . ,wp]
⊤ , where 2 ≤ p ≤ N − 1. (2.25)
Wp is a p×N matrix which takes the form of a Vandermonde Matrix [54]. Here, each
wk =
[
1, e−j2πf
∆x sinψk
c , . . . , e−j2πf(N−1)
∆x sinψk
c
]⊤
, 1 ≤ k ≤ p, correspond to a steering
weighting vector realizing a beam at an angle of ψk off broadside. wi in (2.25) should
be linearly independent for maximizing the number of degrees of freedoms (DoFs).
Picking p = N here will capture all the DoFs from the array.
The Fig. 2.16 illustrates an overview of how multibeam beamforming can be
achieved in different implementation approaches. Different other multibeam beam-
forming architectures can be found in literature. Realization of multiple beams can
be achieved by using a Frost processor shown in Fig. 2.15 by setting the filter co-
efficient that achieves two or more passbands [34, 55]. A multibeam beamforming
network realization method based on 2D network resonance concept is described
in [56].
Picking Wp=N to be the N -point DFT matrix defines a Fourier basis for rep-
resenting the array signal vector and produces N orthogonal RF beams for both
35
A
D
C
A
D
C
C
ha
in
R
F
A
D
C
A
D
C
C
ha
in
R
F
A
D
C
A
D
C
C
ha
in
R
F
A
D
C
A
D
C
C
ha
in
R
F
A
D
C
A
D
C
C
ha
in
R
F
A
D
C
A
D
C
C
ha
in
Shifters
Phase
p
R
F
N
Analog Mux p:m
m
A
D
C
A
D
C
A
D
C
A
D
C
A
D
C
A
D
C
R
F
C
ha
in
R
F
C
ha
in
R
F
C
ha
in
N
beam p
p beams
Digital Beamforming
Analog Mux p:m
M
N N
Analog Mux p:m
beam m
p p
(a) Analog Beamforming (b) Digital Beamforming
Hybrid Beamforming(c)
Figure 2.16: Overview of analog, digital and hybrid multibeam architectures.
transmit and receive multibeam applications. The DFT matrix is full ranked and
thus captures all degrees of freedoms. The orthogonality property of the DFT en-
sures that there is no inter-beam interference. Since the DFT computation can be
achieved in relatively reduced overhead using FFT implementations, the use of FFT
across Nyquist sampled antenna arrays are quite popular. The analog FFT based
‘Butler matrix’ implementations are well known in microwave and antennas commu-
nity to produce multiple simultaneous orthogonal beams [57,58]. Deeper discussion
of spatial FFT based beamforming is carried out in the next chapter.
The common problem in multibeam realization is the associated high compu-
tational/circuit complexity. For larger arrays, producing p beams by realizing Wp
in (2.25) require a hardware complexity in the order of O(pN) which can be pro-
hibitively large in computational complexity or in terms of hardware electronics.
Here, the notation O(·) indicates the upper bound complexity [59]. The motive
for the research explained in this dissertation is this huge complexity problem for
the larger arrays which are demanded by the emerging mmW and sub-THz sys-
tems. Thus low hardware complexity implementations (2.25) is always desired. The
36
research work is focused on such low-complexity algorithms and circuits for imple-
menting (2.25) with analog, digital and hybrid realizations.
37
CHAPTER 3
LOW-COMPLEXITY DIGITAL RF MULTIBEAMS USING
MULTIPLIERLESS APPROXIMATE DISCRETE FOURIER
TRANSFORMS
In this chapter, the use of DFT approximations are proposed for low-SWaPC
implementations of digital multibeam RF apertures using ULAs. An 8-point DFT
approximation and a 16-point DFT approximation is proposed for realizing real-
time digital RF beams with low-complexity VLSI architectures and the proposed
algorithms have been experimentally verified using a real-time phased-array setup.
By using such DFT approximations the digital multiplicative complexity can be
reduced to zero while having negligible repercussions in RF beam patterns compared
to the corresponding fixed-point FFT based RF beams.
3.1 Multibeams and the Role of Fully Digital Beamforming
The efficient formation of far-field antenna patterns simultaneously across a multi-
tude of directions is crucially important for the areas such as wireless communica-
tions, radio astronomy, imaging, radar, and electronic warfare. Multibeam beam-
forming has been usually achieved in the analog microwave domain using analog
techniques (e.g., Rotman lenses [60] and Butler/Nolan matrices [57, 61]). Emerg-
ing mmW systems are considering hybrid multibeam beamforming due to its power
efficiency and excellent performance for a reasonably small number of antenna ele-
ments and user streams [62,63]. Although digital beamforming requires the control
of each individual antenna element in an antenna array, it is promising for the
future due to its many inherent advantages, which include [64]: i) maximum flexi-
bility/reconfigurability; ii) easy system updates and support for new beamforming
algorithms as they emerge; iii) precise control of both the gain and phase of in-
38
dividual antenna elements thus giving better control of the beams; iv) maximum
degrees of freedom from a given array; and v) reduced maintenance and calibration
requirements.
Element-wise digital beamforming requires a dedicated receiver (or a transmitter,
in transmit mode) for each antenna element, which is usually a uniformly spaced
linear or rectangular array of antennas. Multibeams can be generated by expanding
the concept of a phased-array to multiple simultaneous directions by using the fact
that each direction of propagation of a carrier wave is associated with two spatial
frequencies (ωx, ωy) ∈ R2 across the two orthogonal coordinate axes of a rectangular
array aperture. Multiple beam digital beamforming is desired at the lowest possible
energy consumption for a given bandwidth, supply voltage, and technology node,
which leads to domain-specific architectures that are optimized for low complexity
and power consumption.
In this chapter, approximate computing-based algorithms and computing archi-
tectures that achieve quasi-orthogonal RF beams without using any digital mul-
tiplier circuits are proposed. The multiplierless nature of the digital computing
architectures allow low chip area/size, weight, and power consumption (SWaP) and
avoid the need for digital multipliers that have high circuit complexity (transistor
count) and power consumption. This is likely to become more critical as wire-
less systems move to sub-terahertz frequencies and much wider channel bandwidths
than that are used currently [65]. Algorithms that are multiplierless thus lead to
substantially reduced SWaPC in real-time digital silicon implementations [65, 66].
Multibeam beamforming on linear/rectangular apertures is important for ex-
ploiting multi-directional channels in massive-MIMO systems, for example in 5G
mm-wave wireless networks. Such systems rely on the combination of beamforming
with MIMO theory [25,62,63,65,67,68] and as frequencies move to THz ranges, the
39
need for providing thousands of simultaneous beams will emerge due to the small size
of the wavelength and physical antenna aperture. Recent work has described differ-
ent phased array architectures targeting 5G applications [69–75]. However, most of
the literature has been focused either on hybrid beamforming systems or fully-analog
architectures due to the prohibitive processing complexity of fully-digital beamform-
ing. Other important applications for microwave and mm-wave digital multibeam
beamformers include emerging defense applications such as space-based low earth
orbit communications, mesh networks between micro satellites, space-based Internet
distribution to densely populated areas, and multi-domain mosaic warfare where re-
liable high-speed wireless connectivity is needed across multiple platforms [65]. The
demands of high-capacity wireless networks for such applications can be significantly
more difficult to meet than commercial 5G standards [65]. Such demanding wire-
less channel conditions necessitate beamforming gain across wide bandwidths and
narrow angles of propagation (i.e., sharp beams) to both thwart detection and also
benefit from beamforming gain. Furthermore, 5G networks will eventually require
digital beamforming to reduce the overhead associated with the current 3GPP beam
search time in the 5G game structure—great reduction in beam pointing can be ob-
tained by simultaneously searching the environment for the best pointing angle, but
this is not yet supported in the 5G 3GPP standard [26].
Some recent work has focused on achieving element-wise fully-digital beam-
forming. The paper in [76] presents a low-power 8-element digital beamforming
prototype based on bit-stream processing. The design uses a low-resolution ∆Σ
architecture that replaces multipliers with multiplexers. This multiplexer-based
architecture achieves lower power and smaller area than conventional digital beam-
formers, but the design is limited to a 20 MHz bandwidth with only two simultane-
ous output beams. Another recent paper [77] reports a 16-element 4-beam digital
40
beamformer targeting large scale MIMO for 5G communications systems. It uses
a similar multiplexer-based approach as in [76] with an interleaved architecture to
support a 100 MHz bandwidth. The work in [78,79] also report experimental verifi-
cation of fully digital multibeam beamforming schemes targeting MIMO-based 5G
implementations. The paper [80] presents a spatial DFT-based digital multibeam
beamforming implementation scheme for satellite communications.
3.2 Spatial DFT based Multibeams
As described in Chapter 2, picking Wp=N to be the N -point DFT matrix in the
multibeam model of (2.25) defines a Fourier basis for representing the array signal
vector and produces N orthogonal RF beams. This is equivalent to employing
an N -point spatial DFT across the spatial samples of an N -element ULA. The
outputs of such architecture produce N directionally-orthogonal simultaneous RF
beams having unique look-directions. Fig. 3.1 shows the overview architecture of
a so called a spatial DFT based multibeam RF aperture that employs a ULA. In a
narrowband system, the above mentioned method of applying the DFT across the
array receiver samples provides approximately non-squinting beams in digital. The
array factors of the beams are related to the Fourier transform of the row vectors
of the DFT matrix.
The N -point DFT of a finite duration sequence {xnn} ∈ CN×1, of length N is
defined as,
Yk =
N−1∑
n=0
xne
−j2πkn/N ; k ∈ [0, N − 1], (3.1)
where, {Yk} ∈ CN×1. An N -point DFT is an N th order FIR perfect reconstruction
filterbank with kth bin’s filter response having a peak at ω = 2πk
N
as shown in Fig. 3.2.
41
QIQIQI
LO
HQ
ADC ADC
N-point DFT
N
QI
∆x
Figure 3.1: Overview architecture of DFT based multibeam RF aperture employing
a ULA .
ω is the corresponding frequency variable in the frequency domain. When the N -
point x samples are derived from a ULA antenna outputs, then each output Yk
corresponds to a beam with the peak directivity at ωx =
2πk
N
. In Chapter 2.2 it was
shown that ωx
∆x
= − Ωt
sinψ
, thus, for a narrowband system at frequency fc, the beam
direction of each DFT output is given by,
ψ = sin−1
(
kc
∆xfcN
)
. (3.2)
For a Nyquist spaced array, where ∆x = λc/2, and λc = c/fc, the above relation
simplifies to,
ψ = sin−1
(
2k
N
)
. (3.3)
An intuitive illustration of the spatial filtering that takes place in a digital array is
depicted in Fig. 3.2. The figure shows the 2D spectrum of a downconverted baseband
plane-wave as seen by a ULA. The ideal filter passband that is required to filter this
signal spatially can be approximated by an appropriate DFT filterbank response as
shown in the Fig. 3.2.
The DFT computation at each time-step requires an N -point vector of complex-
valued transceiver signals (i.e., I and Q samples) to be multiplied with the N × N
42
M
a
g
n
it
u
d
e
−π,−π
−π, π
ideal passband
passband
approximated
ωct
Downsampled signal spetra
Normalized Spatial Frequency
π, π
π,−π
ωx
Figure 3.2: (a) Responses of a 8-point DFT filterbank; (b) illustration of the filtering
of baseband 2D PW using a DFT filter response.
DFT matrix, where the brute-force computation involves a complexity of O(N2)
for real-time N -beam digital beamforming. The orthogonality property of the DFT
ensures that the main lobe of a given beam falls into the nulls of the other (N − 1)
beams, in turn, ensuring no inter-beam interference (under the assumption of no
mutual coupling between elements). In practice, N -point DFT is computed at a
lower complexity O(N logN) multiplications and additions per input frame, using
sparse factorizations of the DFTmatrix [81]. Such factorizations constitute a suite of
algorithms collectively known as Fast Fourier Transforms (FFTs). Thus, formation
of N -beams in narrowband digital is achieved by employing a digital N -point FFT
on each of the complex I and Q (IQ) signal vector from the array at every time
sample. A spatial FFT based digital multibeam beamformer is discussed in [80]
and [79] describe a multibeam digital array for MIMO 5G wireless communications
applications. More recent digital multibeam integrated circuits (IC) realizations on
the same spatial FFT based multi beamforming can be found in [74, 82].
43
3.3 Discrete Transform Implementation Methods
The conventional approach for implementing DFT is to implement an appropriate
FFT algorithm by using fixed-point truncated twiddle factors in the signal flow
graph. The implementation of FFT/DFT in fixed-point digital hardware always
leads to errors in representing the twiddle factors [83] which are mostly irrational.
Owing to the fact that twiddle factors are represented in fixed-point precision in
hardware, implementations of discrete transforms are in fact approximations because
of the use of finite precision associated in digital arithmetic inside VLSI processors.
Apart from the direct approach of fixed-point approximation of the twiddle-factors,
several other approaches have been presented in literature towards low-complexity
implementations of the DFT by various methods of approximations.
The work in [84] presents a method for implementing discrete transform where
its coefficients are represented by integers that are additions of powers of two. Use
of such approach allows one to implement the computation of discrete transform
with only additions and bit-shifting operations. This method is in essence a suc-
cessive application of the canonical signed digit representation to the transform
coefficients [85].
Discrete transforms can be implemented using the coordinate rotation digital
computer (CORDIC) [86,87]. In this approach, the discrete transform is computed
with successive applications of rotations to the input signal coordinates. At the end
of the process, one can obtain an output that is close to what would be the exact
output. Closeness of the output of such methods to the ideal output depends on
the application.
Another class of methods can be found that approximates the discrete transforms
based on interpreting the rows of a discrete transform as the coefficients of a linear
44
filter. Using this methodology, several digital filter design techniques have been
applied to the implementation of discrete transforms [88–91].
Different methods for relatively efficient computation of the exact DFT methods
do exist [92–98]. Those methods are fundamentally different from approximating
methods since they are aimed at problems requiring high precision or exact compu-
tation such as cryptography. The high precision or exact computation comes at the
expense of higher area and resource consumption [96,97]. The work in [84] presents
a method for high accuracy DFT approximation based on the approximation of
trigonometric quantities by sum-of-powers-of-two (SOPOT). Those approximations
are basically canonical signed digit representation, which is a well-known approach
to the problem of reducing arithmetic complexity [99–102]. In general, due to the
high accuracy that is sought in these methods, the resulting transforms possess an
arithmetic complexity that consume much higher area and power consumption in
digital VLSI implementations.
In contrast, the application at hand: using the DFT for generating multiple
simultaneous beams is much robust to errors. Most beamforming applications can
withstand the errors caused by precision limitations or the approximations of the
filter bank responses of DFT filter bank responses.
In the work presented in this chapter, a new class of DFT approximations is
explored for achieving multibeam beamforming. The beam responses, the perfor-
mance deviations incurred, and the digital VLSI resource utilization due to the use
of the approximated transformations are quantified with respect to the use of the
ideal DFT. The main difference between the aforementioned fixed-point VLSI imple-
mentations when compared to the proposed architecture resides on how the discrete
transforms of interest are approximated. The underlying mathematical basis of the
approximation, with sparse factorization based on structured matrix theory, enables
45
new types of digital VLSI implementation that completely devoid power and area
hungry multiplier circuits, while still achieving acceptable RF beams on the array
aperture.
The difficulty in proposing DFT approximations for larger sequences rely on
the hardness of the deriving efficient fast algorithms for generated approximations,
simply because the approximate transforms may not preserve the same symmetries
and mathematical properties that exist in the exact DFT matrix.
3.4 DFT Approximations through Parameterization
A class of DFT approximations based on the technique of parameterization of the
DFT matrix is explored for generating multiple digital beams that are entirely based
on additions and subtractions which approximates the beam responses of an ideal
DFT very closely. For this approach, the primary focus is on the DFT block sizes of
powers of two i.e., DFT approximations for systems front-end having N elements,
where N = 2k, and k ∈ Z+ are only explored. This is due to the nice symmetries
associated with such structures. Moreover, in this chapter, DFT approximations to
be used with an 8- and 16-element arrays are investigated. Chapter 4 will present
a low-complexity implementation of a 32-beam system and design methodology
towards a 1024-beam digital aperture targeting emerging 5G communication systems
based on a 32-point DFT approximation.
The N -point DFT is represented by the N×N matrix FN whose entries are given
by [FN ]k,n = ω
nk
N , where ωN = exp(−2πj/N) is the Nth root of unity, j =
√
−1,
and n, k = 0, 1, . . . , N − 1 [103].
Since, low-complexity implementations of the N -point DFT matrix is the target,
the real and imaginary components of [FN ]k,n coefficients of the matrix FN are pro-
46
posed to be approximated by small integer values P where P = {0,±1,±2,±1/2}.
The set P is chosen such that the hardware implementation is reduced to trivial bit
shifts and additions operations instead of involving multipliers. Such approach will
produce a search space Q = {z ∈ C : R{z} ∈ P∧I{z} ∈ P}. Thus, set Q defines
the search space for the coefficients of the DFT approximation considered. Then,
employing a parametric-based optimization method as described in [104], optimal
low-complexity approximation F̂N for N = 8, 16 have been derived subjected to
multiple constraints. The parametric-based optimization work was conducted by the
UPFE collaborators (Dr. R. J. Cintra and his students) to arrive at the approxi-
mated matrices.
During the optimization process, the target approximations have been optimized
for the (i) high proximity to the exact 16-point DFT matrix [105]; (ii) low complex-
ity [106]; (iii) orthogonality or near-orthogonality [107–109]; and (iv) invertibility.
Proximity Measure
The norm of the relative difference between F̂N and FN assess the proximity of
the resulting matrices relative to the exact N-point DFT as measured by Frobenius
norm [110]. The proximity measure can be represented as follows.
d(α) =
∥
∥
∥F̂N − FN
∥
∥
∥
2
F
‖FN‖2F
, (3.4)
where ‖ · ‖F denotes the Frobenius norm. Good DFT approximations are linked to
small values of d(α).
Complexity
As mentioned, to ensure low complexity, the real and imaginary parts of the pa-
rameter values have been constrained to the set P = {0,±1,±1/2,±2} as sug-
47
gested in [108]. Since these multiplicands can be trivially implemented in hardware,
the particular constraint leads to hardware realizations with reduced area, higher
throughput and operating frequency.
Orthogonality or Near-Orthogonality
Orthogonality is detected when F̂HN F̂N is an identity matrix, where the superscript
H denoted Hermitian transposition. Near-orthogonality is a more general concept
aimed at identifying matrices “almost” orthogonal. This can computed by means
of the deviation from orthogonality function φ(·):
φ(F̂N) = 1−
∥
∥
∥diag
(
F̂N F̂
H
N
)∥
∥
∥
F∥
∥
∥F̂N F̂HN
∥
∥
∥
F
. (3.5)
The function shown above is derived from the deviation from diagonality func-
tion [111]. From the DCT approximation theory, it has been noticed that a deviation
from orthogonality smaller than 0.2 is sufficient to ascribe near-orthogonality [105].
Thus that threshold is adopted in the generation of approximations.
Invertibility
The matrix mapping based search process might lead to singular matrices. In order
to allow for signal reconstruction a DFT approximation must (i) be a non-singular
matrix with a low condition number and (ii) have a low-complexity inverse matrix.
Mathematically, the above constraints are translated into non-null matrix determi-
nant det(F̂N) 6= 0 and the entries of the inverse matrix F̂−1N have real and imaginary
parts in P.
48
3.5 8- and 16-Point Approximate Transforms and Beam Re-
sponse Analysis
The ADFT matrices were derived through search and optimization by the collab-
orators in UPFE for N = 8, and 16 in accordance with the criterion mentioned
in Section 3.4. The derived approximate transforms have also been subjected to a
sparse factorization which further reduces the adder complexity of the digital im-
plementation of the approximate transforms. The subsequent sections will present
the ADFTs for N = 8, 16 that have been derived in the matrix form, and then an
analysis of the beam responses pertaining to each ADFT is conducted.
3.5.1 8-Point ADFT
The matrix form of the 8-point DFT approximation found is given in (3.6) [112].
F̂8 =
1
2
·






















2 2 2 2 2 2 2 2
2 1− j −2j −1 − j −2 −1 + j 2j 1 + j
2 −2j −2 2j 2 −2j −2 2j
2 −1 − j 2j 1− j −2 1 + j −2j −1 + j
2 −2 2 −2 2 −2 2 −2
2 −1 + j −2j 1 + j −2 1− j 2j −1− j
2 2j −2 −2j 2 2j −2 −2j
2 1 + j 2j −1 + j −2 −1− j −2j 1− j






















(3.6)
It can be seen that F̂8 has elements consisting only of 0,±1,±2 which can be realized
using only adder and bit-shift operations, implying the use of zero multipliers. The
49
F̂8 is further factorized for reduce adder complexity implementation as in (3.7).
F̂8 = P× diag(I2,A1,A3)×D2 × diag(B2, I2,A4)×D1 × diag(B4,A2)×B8,
(3.7)
where, Bn =



1 1
1 −1


 ⊗ In/2, In is the identity matrix of order n and ⊗ denotes
the Kronecker product. A1 =



1 −1
1 1


 , A3 =









1 −1 0 0
0 0 −1 1
1 1 0 0
0 0 1 1









, A2 =









1 0 0 0
0 1 0 1
0 0 1 0
0 1 0 −1









, A4 =









1 0 0 1
0 1 1 0
0 1 −1 0
1 0 0 −1









. D1 = diag(1, 1, 1, 1, 1, 1/2, 1, 1/2),
D2 = diag(1, 1, 1, j, 1, j, j, 1), P = [e1|e5|e3|e6|e2|e8|e4|e7]T where ei is the 8-point
column vector having a 1 at the ith position and 0 elsewhere.
As emphasized earlier in the chapter, since the coefficients of F̂8 are small integer
coefficients, (3.6) can be implemented using adders only. The adders only signal
flow graph for the 8-point ADFT is shown in Fig. 3.3. The frequency responses
of each output bin of the 8-point ADFT was studied to compare them with the
exact transform’s beam responses. Numerically simulated beam patterns obtained
using the 8-point ADFT algorithm and the 8-point DFT are shown in Fig. 3.4.
The patterns are plotted against normalized angular frequency ω ∈ [π, π] which
is equivalent to beam responses of Nyquist spaced narrowband array from ψ ∈
[−90◦,+90◦] where ψ is the angle measured from the array broadside. Fig. 3.4(a)
shows the exact DFT beams and Fig. 3.4(b) shows the beams obtained using the
8-point a-DFT.
50
Figure 3.3: Signal flow graph of 8-point a-DFT.
(a) (b)
Figure 3.4: (a) Exact DFT beams; (b) beams obtained using the 8-point ADFT; (c) error
between the two transforms.
3.5.2 16-Point ADFT
Considering above discussion, the problem of finding a good DFT approximation
can be formalized as the following constrained nonlinear optimization problem:
α
∗ = argmin
α
d(α), (3.8)
subject to
1. ℜ{α} ∈ P and ℑ{α} ∈ P ;
2. det(F̂16) 6= 0;
51
3. φ(F̂16) ≤ 0.2;
4. the inverse matrix must be of low-complexity.
The optimum solution for (3.8) was found to be
α
∗ =
1
2
[
2j 1− j 1− 2j
]⊤
,
with d(α∗) = 8.58·10−2 by the conducting an exhaustive search by our collaborators
at UFPE. The derived 16-point ADFT matrix is given in (3.9), and is represented
as 4 sub matrices for the convenience of presenting.
F̂16 =
1
2



A0 A1
A2 A3


 (3.9)
where Ai, i = 0, 1, 2, 3, are 8× 8 sub-matrices as follows:
A0 =



2 2 2 2 2 2 2 2
2 2−j 1−j 1−2j −2j −1−2j −1−j −2−j
2 1−j −2j −1−j −2 −1+j 2j 1+j
2 1−2j −1−j −2+j 2j 2+j 1−j −1−2j
2 −2j −2 2j 2 −2j −2 2j
2 −1−2j −1+j 2+j −2j −2+j 1+j 1−2j
2 −1−j 2j 1−j −2 1+j −2j −1+j
2 −2−j 1+j −1−2j 2j 1−2j −1+j 2−j


 , (3.10)
A1 =



2 2 2 2 2 2 2 2
−2 −2+j −1+j −1+2j 2j 1+2j 1+j 2+j
2 1−j −2j −1−j −2 −1+j 2j 1+j
−2 −1+2j 1+j 2−j −2j −2−j −1+j 1+2j
2 −2j −2 2j 2 −2j −2 2j
−2 1+2j 1−j −2−j 2j 2−j −1−j −1+2j
2 −1−j 2j 1−j −2 1+j −2j −1+j
−2 2+j −1−j 1+2j −2j −1+2j 1−j −2+j


 , (3.11)
A2 =



2 −2 2 −2 2 −2 2 −2
2 −2+j 1−j −1+2j −2j 1+2j −1−j 2+j
2 −1+j −2j 1+j −2 1−j 2j −1−j
2 −1+2j −1−j 2−j 2j −2−j 1−j 1+2j
2 2j −2 −2j 2 2j −2 −2j
2 1+2j −1+j −2−j −2j 2−j 1+j −1+2j
2 1+j 2j −1+j −2 −1−j −2j 1−j
2 2+j 1+j 1+2j 2j −1+2j −1+j −2+j


 , (3.12)
A3 =



2 −2 2 −2 2 −2 2 −2
−2 2−j −1+j 1−2j 2j −1−2j 1+j −2−j
2 −1+j −2j 1+j −2 1−j 2j −1−j
−2 1−2j 1+j −2+j −2j 2+j −1+j −1−2j
2 2j −2 −2j 2 2j −2 −2j
−2 −1−2j 1−j 2+j 2j −2+j −1−j 1−2j
2 1+j 2j −1+j −2 −1−j −2j 1−j
−2 −2−j −1−j −1−2j −2j 1−2j 1−j 2−j


 . (3.13)
The approximation in (3.9) only contains elements from the set [1 1j 1/2 −
j/2 1/2− j j], and thus can be implemented in digital using only adders and bit
52
Figure 3.5: Frequency response comparison of the filterbanks of the proposed ap-
proximation and the DFT. The x-axis is the normalized frequency and y-axis cor-
responds to the magnitude in dB.
shifts. The frequency responses of each bin of the 16-point ADFT transform and
the corresponding 16-point DFT bin responses are shown in Fig. 3.5.
The adder complexity of F̂16 in (3.9) is reduced by factorizing the approximate
transformation and implementing the factorization. The fast algorithm for the trans-
form F̂16 can be derived using factorization approaches related to the decimation-
53
in-frequency method [81]. Thus, factorization of the approximation F̂16 is given in
3.14.
F̂16 = P2 ·B5 ·P1 ·D ·B4 ·B3 ·B2 ·B1, (3.14)
where,
B1 =










1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
−1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 −1 0 1 0 0 0 0 0 0
0 0 0 0 0 0 −1 0 0 0 1 0 0 0 0 0
0 0 0 0 0 −1 0 0 0 0 0 1 0 0 0 0
0 0 0 0 −1 0 0 0 0 0 0 0 1 0 0 0
0 0 0 −1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 −1 0 0 0 0 0 0 0 0 0 0 0 1 0
0 −1 0 0 0 0 0 0 0 0 0 0 0 0 0 1










, (3.15)
B2 =










1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0
−1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 −1 0 1 0 0 0 0 0 0 0 0 0 0
0 0 −1 0 0 0 1 0 0 0 0 0 0 0 0 0
0 −1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 −1 0 1 0 0
0 0 0 0 0 0 0 0 0 0 −1 0 0 0 1 0
0 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 1










, (3.16)
B3 =










1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
−1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 −1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0
0 0 0 0 0 2 0 −1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 −2 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 2 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 2 0 0 0
0 0 0 0 0 0 0 0 0 2 0 −1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 −2 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 −1










, (3.17)
B4 =










1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 −1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 −2 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 −1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 −1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 −1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 −1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 −2 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1










, (3.18)
54
B5 = blkdiag
(
2, diag (−1,−1, 1,−2,−1, 1, 1)⊗ [ 1 11 −1 ]
)
, (3.19)
P1 =










1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0










, (3.20)
P2 =











1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0











, (3.21)
and D = 1
2
diag(1, 1, 1, 1, 1, 1, 1, 1, 1, j, j, j, j, j, j, j). The operator blkdiag(·) maps
its arguments into a block diagonal matrix and the symbol ⊗ denotes the Kronecker
product [113]. Matrices Bk, k = 1, 2, 3, 4, 5, represent butterfly sections. Matrices
P1 and P2 are permutation matrices and the matrix D is a diagonal matrix with
trivial elements. Note that the nonzero entries of the above sparse matrices are in the
set {±1,±2}. This means that the computation of the proposed DFT approximation
requires only additions/subtractions and bit-shifting operations, which are efficiently
performed in hardware. The signal-flow graph (SFG) of the above fast algorithm
for the proposed 16-point DFT approximation is depicted in Fig. 3.6. The proposed
fast algorithm is multiplierless and requires a total of 116 real additions and 54 bit-
shifting operations. For comparison, the arithmetic complexities for performing
16-point DFT using different fast algorithms are listed in Table 3.1.
55
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
2
2
2
2
2
2
+
+
+
+
+
1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
j/2
j/2
j/2
j/2
j/2
j/2
j/2 2
2
2
2
NegationAddition
Multipliation
by onstant c
c
2
2
x1
x2
x3
x4
x11
x12
x13
x14
x15
x5
x6
x7
x8
x9
x10
x0 X0
X15
X14
X2
X3
X13
X4
X12
X11
X1
X10
X9
X8
X5
X6
X7
Figure 3.6: The SFG for the fast algorithm of the proposed DFT approximation.
Table 3.1: Comparison of arithmetic complexities for performing the 16-point DFT
using different FFT algorithms
16-point FFT
Algorithm
No. of
real additions
No. of
real multipliers
Radix-2 [81, p. 76] 152 24
Winograd [81, p. 102] 74 10
Radix-4 [81, p. 80] 148 20
Proposed 116 0
3.5.3 Error Analysis
An approximate matrix can be understood as a filter bank; each matrix row being
a finite impulse response filter. Let T = [tk,n], n, k = 0, 1, . . . , N − 1, be an N ×N
linear transform. The transfer function of each of its rows are given by the discrete-
56
time Fourier transform:
Hk(ω;T) =
N−1∑
n=0
tk,n · e−jnω, k = 0, 1, . . . , N − 1,
where ω ∈ [0, π]. The error energy [109] associated to each filter is
ǫk =
∫ π
0
|Hk(ω;F16)−Hk(ω; F̂16)|2 dω, k = 0, 1, . . . , 15, (3.22)
resulting the following values:
ǫ0 = ǫ4 = ǫ8 = ǫ12 = 0,
ǫ1 = ǫ3 = ǫ5 = ǫ7 = ǫ9 = ǫ11 = ǫ13 = ǫ15 = 1.57,
ǫ2 = ǫ6 = ǫ10 = ǫ14 = 2.16.
(3.23)
Therefore, the total energy error is
∑15
i=0 ǫi ≈ 21.18. This error is considered low
when compared to other 16-point approximations. For instance, the total error
energy from the approximations described in [114] and [115] are 30.32 and 41.00,
respectively.
Figure 3.5 shows the frequency responses Hk(ω;F16) and Hk(ω; F̂16) linked to
the filters formed by the rows of the exact DFT and the proposed approximation,
respectively. The deviations in the filter bank responses with respect to the DFT
filter bank responses arise as a result of filter coefficients not being ideal valued (be-
cause they are approximated by small integers restricted to the set P = {0,±1,±2}).
Fig. 3.5 shows that the deviations in the filterbank responses occur in the deep stop-
band of the ADFT responses. These deviations do not exceed the highest side-lobe
level. Since the beamformer performance is set by the largest side-lobe level, these
deviations in the stopband will not degrade the performance of the phased array.
57
Table 3.2: Hardware resource consumption for the fixed-point FFT and
approximate-DFT cores for a 16-bit input word length on the Xilinx Virtex-6 sx475t
FPGA. The fixed-point FFT core uses 8-bit coefficient precision.
Figure of merit
Exact FFT Approx. DFT Percent
(8-bit precision) (multiplierless) Savings
TCPD (ns) 1.966 1.886 4.07%
Slice Registers 3,247 2,528 22.14%
LUTs 4,030 2,488 38.26%
Occupied Slices 1,338 809 39.54%
Flip-flops 4,543 2,765 39.13%
3.5.4 Digital Implementation of 16-point ADFT
The proposed fast algorithm for the 16-point approximate DFT and the 16-point
Cooley-Tukey FFT algorithm were implemented as digital cores for comparing hard-
ware resource utilization. Xilinx tools were used to implement the designs (targeting
FPGA realization) in a fully parallel input-output architecture. The designed cores
were synthesized and mapped to Xilinx Virtex-6 sx475t chip. The hardware resource
comparison for an 16-bit input word length for the two digital cores is given in Ta-
ble 3.2. Here, the twiddle factor word length for the fixed-point FFT design was
fixed to 8 bits. As shown in Table 3.2, there is a considerable saving in the hardware
utilization in the ADFT version which reflects as a reduction in area in an ASIC
implementation. The critical path delay (TCPD) is low for the approximate case as
anticipated but percent saving is relatively low due to the low precision levels used
in both.
58
3.6 Experimental Verification of the Low-Complexity
Beams using ADFTs
A 2.4-GHz RF system with a digital processing-end was designed and implemented
to physically measure and compare the beams from the 8-point and 16-point ap-
proximate DFT algorithms that are proposed for multibeam beamforming with the
respective exact DFT’s beam responses. The 2.4 GHz system was built with a 16-
element antenna front-end so that both the 8-point and 16-point algorithms can
be tested. Fig. 3.7 shows the overall system architecture of the digital 2.4 GHz
array-receiver experimental setup.
As illustrated in Fig. 3.7, direct conversion in phase (I) and quadrature (Q)
receiver chains were used with each antenna element in the setup. The obtained
downconverted basedband signal is then processed using a digital hardware through
analog-to-digital conversion (ADC). The DFT-based multi-beamforming is per-
formed in digital hardware. Each beam output is then further processed to in-
tegrate and estimate the received beam energy for a specific antenna orientation to
obtain the beam patterns. The next subsections details the major subsystems of
the architecture shown in Fig. 3.7.
3.6.1 2.4 GHz Front-End Antenna Array
A 16-element array of patch antennas that work at 2.4 GHz was used. The design of
the patch antenna has been discussed in [116]. A single patch antenna is shown in
Fig. 3.8(a). The patch antenna is integrated with a low-noise amplifier. The mea-
sured |S11| of the fabricated single patch antenna is show in Fig. 3.8(b). Fig. 3.8(c)
shows the measured power pattern of each patch antenna. The 16-element array
59
90
0
90
0
Power PC
calculator calculator
Energy
F
P
G
A
I Q I Q
BPF
R
F
R
e
c
e
iv
e
r
c
h
a
in
D
ig
ita
l
H
a
rd
w
a
re
LNA
IQ mixing stage
LPF
I Q I Q
LO
Approximate-DFT/
ROACH-2 PLATFORM
ADC ADC ADC ADC
Energy
exact-DFT
IF amplifier
A
n
te
n
n
a
a
rra
y
Figure 3.7: Overall system architecture of the 2.4 GHz array-receiver setup.
Low noise
Patch
Quarter
 wave
amplifier
−10 dB
−20 dB
0 dB
(a)
Frequecy [GHz]
(b)
(d)
M
ag
ni
tu
de
 [d
B
]
(c)
Figure 3.8: (a) Measured |S11| of a single patch antenna; (b) fabricated 2.4 GHz
patch antenna element with the integrated LNA; (c) measured power patterns of
each antenna element in the array. (d) full 16-element antenna array.
60
Amplfication
Amplification
BPF
LPF
Splitters
Mixers
ADC cards
Xilinx Virtex−6 sx475t FPGA chip(b)(a)
Figure 3.9: (a) RF receiver chains containing a bandpass filter, low-noise amplifier,
splitter, mixers, low-pass filter, and an IF amplifier. (b) ROACH-2 platform
was constructed with the element spacing λ/2 ≈ 6cm. The patch antenna array
used in the system is depicted in Fig. 3.8(d).
3.6.2 Microwave Front-End
The outputs of the antennas were fed in to 16 direct-conversion IQ receivers for
sampling and digital processing. IQ receiver chains were built using commercial
off-the-shelf (COTS) components. The built receiver chains are shown in Fig. 3.9.
Each chain was designed to have a low-noise amplification stage (10 dB) (total
gain of 20 dB with the integrated LNA on the patch antenna) followed by band-
pass filtering and a mixing stage. A centralized LO distribution network was used.
The mixer outputs were low-pass filtered and amplified (10 dB gain) to obtain the
downconverted IF signal.
3.6.3 Baseband Digital Processing Hardware and Circuits
Digital processing of sampled signals was performed using the ROACH-2 plat-
form [117] which is shown in Fig. 3.9(b). The ROACH-2 platform provides a flexible
61
N
−
po
in
t
a−
D
F
T
/D
F
T
 c
or
e
Digital Normalizer
F
ro
m
 A
D
C
DetectorEnergy Software Controlled Register
z−n
z−m
Ix0
Qx0
IxN−1
QxN−1 Re
2 + Im2
Re2 + Im2
IxN/2−1
z−p
z−p
z−(m+n+p)SCR
SCR
SCR
SCR
SCR
SCR
SCR
SCR
Figure 3.10: Architecture of the digital back-end that generates N -beams (N =
8, 16). The figure also shows the energy calculation circuits used for measuring the
beam patterns in digital.
versatile hardware platform that comprises of a Virtex-6 sx475t FPGA along with
2-zDOK interfaces that connects to the FPGA which can be used to interface the
analog to digital conversion (ADC) cards supporting up to 32 simultaneous chan-
nels. The sx475t device has 476,160 logic cells, 74,400 configuration logic blocks
(CLBs) and 2016 DSP48 slices.
The “ADC16x250-8” daughter cards which have been build as a part of the
CASPER hardware with compatibility to ROACH-2 boards were used in our setup
for sampling. Each card can accommodate up to 16 analog inputs. The cards can
be configured to achieve different sampling rates, i.e., 32 inputs up to 240 MHz, 16
inputs up to 480 MHz and 8 inputs up to 960 MHz. A picture of the ADC cards
installed on the ROACH-2 platform is shown in Fig. 3.9(b). The two cards together
provide 32 analog inputs enabling fully simultaneous sampling of the 32 IQ channels
from the 16-element channels.
62
The generic digital architecture used for real-time measurements of the beams
from 8-point and 16-point transforms (for both approximate and exact) is illustrated
in Fig. 3.10. The digital cores for performing 8-/16-point ADFT and fixed-point FFT
were designed targeting the ROACH-2’s’ Virtex-6 FPGA. The input word lengths
of the transform cores were set to be 8-bit. This was done since the ADC16x250-8
ADC cards used in the setup had an 8-bit output. The cores were pipelined so that
they could be run at clock speeds up to 240 MHz. Fixed point FFT designs were
configured with 8-bit precision twiddle factors to achieve a trade off versus precision
and area. The beam energy per received direction was calculated digitally for all
beam outputs. Separate energy calculation circuits were employed for this at each
of the N complex outputs of the N -point transform.
The front portion of the digital architecture consists of the digital normalizing
circuit, which is used to calibrate RF chains. The calibration procedure is described
in next section. This stage consists of a set of multipliers (an N element design will
need 2N multipliers in this stage) where one input of each multiplier is connected to
a 32-bit software controllable register (SCR). The other input is the ADC channel.
All these software configurable registers are first set to 1 to determine the calibra-
tion gains of each RF chain for a reference input. Once the calibration gains are
determined for each channel, each SCR is overridden with the corresponding gain
value.
After the digital normalizing stage, the signals are driven to the ADFT/FFT
digital cores. The in-phase signals are fed to the real inputs of the core and the
quadrature signals are input to the imaginary outputs. Next, the real and imagi-
nary outputs of the corresponding output bin of the digital FFT core are sent for
calculating the instantaneous power of the sample. This is achieved by performing
(Re{Yk})2 + (Im{Yk})2 where 0 ≤ k ≤ N − 1. This is implemented with two mul-
63
tipliers and one adder per channel. The word length of the input to this block will
depend on the bit growth due to the ADFT/FFT core. The output from this block
will be sent to an accumulator to integrate over a pre-specified time period. The
time of integration is designed to be modifiable through software control.
The beam measurement was done by performing the functionality of a lock-in
amplifier to filter out ambient 2.4-GHz radiation present in the environment. To
achieve this, the transmitted signal will be switched on and off at a particular rate,
and the energy level received when the transmitter is on and off are calculated
separately. The transmitter design approach is discussed at the end of the Section
3.6.4. The digital circuit architecture shown in Fig. 3.10 has been designed to
work with the lock-in amplifier setup. For this purpose, two integrators will be
used for each channel and these integrators are activated depending on whether the
transmitted signal is on or off. An energy detector is employed at the front of the
circuit to achieve this functionality. The Boolean output from this block will be used
as a select signal of a demultiplexer (or demux) that selects one of the integrators
when the RF is on and the other when RF is off. Finally, the difference of the
two integrator values is computed as the received energy of a particular bin. This
eliminates the effect of any RF interference in the ambient. Computed values are
updated in FPGA memory and then are read to the host server using the software
routines.
Calibration
The circuits required calibration to achieve proper functionality prior to obtaining
measurements. Calibration was essential at two main points. First, each RF receiver
needed to be calibrated due to the mismatches present in amplification, mixing and
filtering. In addition, calibration of the ADC chips integrated into the ROACH-2
64
Receiver Array
(a) (b)
(c)
Transmitter
(d)
Tx SG
ROACH−2
FPGA Clk
LO
Direct Conversion Receiver Chains
Figure 3.11: Experimental setup: (a) transmitter and receiver in the anechoic cham-
ber, (b) reciever instrumentation setup including the RF receivers, (c) front-view of
the antenna array, (d) rotation platform.
platform was required. Calibration of the ADC chips was performed with calibra-
tion scripts provided by CASPER [118]. These scripts facilitated calibration for a
reference input signal of the same dynamic range as the actual input. A separate
microwave circuit was included in the RF front end to achieve this and calibration
of the RF front ends was achieved as well.
Due to mismatches of the 32 RF chains of the current setup, the chain outputs
to a reference input signal were not uniform. Thus, the gain and phase variations
of each channel with respect to a reference input were recorded and were used to
calibrate each RF chain digitally as shown in Fig. 3.10 to equalize the outputs of all
chains.
65
3.6.4 Experimental Setup and Beam Measurements
Fig. 3.11 shows the total experimental setup built for experimental verification of the
low-complexity beams from the ADFTs. As shown in Fig. 3.11(a), the 16-element
2.4 GHz receiver array was placed inside an anechoic chamber. A 2.4-GHz trans-
mitter antenna was employed at one end of the chamber to generate a plane wave
tone. Transmitter and the receiver array were separated approximately by 3 meters
to approximately put the receiver array in the far field of the transmitter. The
transmitter was remained fixed and the receiver array was rotated around its center
using a precision rotation platform controlled by software to take measurements of
the received energy level for different angles. Fig. 3.11 (c) and (d) show a close up
of the receiver array inside the anechoic chamber and the precision rotation plat-
form used to rotate the array, respectively. The RF-chains, digital back-end, and
other equipments are placed outside the anechoic chamber. The antenna array fed
the receiver chains via coaxial cables. The LO was fixed at 2.39 GHz to generate
an 10 MHz IF. The third oscillator was used to clock the ROACH-2 (FPGA) at
240 MHz and to perform the parallel sampling of all 16 IQ IF channels. Local oscil-
lator signal was generated using a “NOISE XT SL” low jitter clock synthesizer to
minimize the phase noise associated in the sampled channels. All oscillators were
referenced to a 10 MHz reference clock to further minimize the effect of phase noise
generated from oscillators. The ROACH-2 FPGA platform was connected to a host
Linux server for software control of the measurement setup. The calculated energy
measures (for each output bin of approximate/exact DFT) for each angle of recep-
tion were recorded in FPGA block RAMs and were read through software routines
to generate the beam pattern. The measurements were conducted in 0.9◦ steps.
A fully Python based software controlled system has been developed to perform
the beam measurement task in full automated manner on top of the software-to-
66
1.2 GHz
IF
2.4 GHz
RF(2.4 GHz)
IF
(b)
RF−in
RF−out
(a)
(c)
Figure 3.12: Lock-in amplifier design made for generating the transmitted signal
with on-off keying..
hardware interface layer provided by the ROACH-2 platform. A sub Python routine
was developed to control the motor for precise rotation of the array for beam angle
measurements. The “8SMC4-USB” motor controlling platform [119] was used to
issue commands from software routines to the platform via a virtual COM-port.
ROACH-2 platform provides a middle layer to communicate between the FPGA
hardware (block RAM memory) by connecting via the on-board Power PC com-
puter. ROACH-2 is connected to the main host Linux server through an Ethernet
connection. The main Python routine is programmed to access the ROACH-2 plat-
form to perform control functions and read data from the FPGA memory while
iteratively scanning through the angles. All together, this constitute a fully auto-
mated beam measurement setup, allowing all beam measurements to be performed
in a single run.
Lock-in Amplifier Setup for Obtaining Measurements
The transmitter component of the test setup was modified to realize a lock-in ampli-
fier behavior to improve the measurement from any potential reflections or ambient
2.4-GHz radiation present in the room environment. For this purpose, the trans-
mitting signal was converted to a continuous on-off pulse of a 2.4-GHz signal. Fig.
67
3.12 (a) shows the block diagram of the hardware configuration used to generate
such transmitted signal. Instead of directly using a 2.4-GHz signal input, a 1.2-GHz
signal was used. This signal was modulated to on-off keying by using an IF signal
generated from another FPGA board (Xilinx Xtreme DSP kit 4 [120]) via its digital
to analog converter (DAC). This signal was then split, mixed, and bandpass filtered
to obtain the 2.4-GHz continuous pulse signal. Fig. 3.12(b) shows the COTS com-
ponent realization of the transmitted signal generation circuit using commercially
available mixers and amplifiers. The energy detector block shown in the digital cir-
cuit architecture in Fig. 3.10 was used to detect the presence of the carrier. Fig. 3.12
(c) shows a capture of samples from FPGA corresponding to such transmission.
3.6.5 Measured Beams
8-Point Beam Measurements
The center 8-elements of the 16-element array were used to test and measure the
beams obtained using the 8-point approximate transform. The digital design ar-
chitecture shown in Fig. 3.10 was used with the 8-point ADFT/FFT digital cores.
The precision rotor stage was used to obtain the received energy for a resolution of
1◦ ranging from −65◦s to +65◦s of array broadside. Once the array is moved to a
new position, all the integrators were reset and integration was started on all the
beam output signals simultaneously to a fixed amount of time (5s). The computed
values were then stored in the BRAMs of the FPGA. The recorded values were
communicated to the host PC through Python routines.
The process was repeated for each angle according to the rotation resolution
used, and the stored values were plotted in Matlab to generate the beam patterns.
Same procedure is repeated using both ADFT and DFT cores for comparison.
68
Bin: 1Bin: 0
Bin: 2 Bin: 3
Bin: 4 Bin: 5
Bin: 6 Bin: 7
M
a
g
n
it
u
d
e
[d
B
]
M
a
g
n
it
u
d
e
[d
B
]
Angle [◦]
M
a
g
n
it
u
d
e
[d
B
]
M
a
g
n
it
u
d
e
[d
B
]
M
a
g
n
it
u
d
e
[d
B
]
M
a
g
n
it
u
d
e
[d
B
]
M
a
g
n
it
u
d
e
[d
B
]
Angle [◦]
Angle [◦]
Angle [◦]
Angle [◦]
Angle [◦]
Angle [◦]
Angle [◦]
M
a
g
n
it
u
d
e
[d
B
]
Figure 3.13: Measured and simulated beam patterns for each bin of 8-point approx-
imate and exact transforms.
69
(a) (b)
(c) (d)
Angle [◦]
M
a
g
n
it
u
d
e
[d
B
]
Angle [◦]
Angle [◦]
Angle [◦]
M
a
g
n
it
u
d
e
M
a
g
n
it
u
d
e
M
a
g
n
it
u
d
e
[d
B
]
Figure 3.14: (a) All beam patterns using the approximate transform from the raw
values measured at each bin output, (b) The normalized beam patterns in the log
domain for the approximate transform, (c) All beam patterns obtained using the
the exact FFT core, (d) The normalized patterns of (c) in the log domain.
Fig. 3.13 shows the plots generated from the measured values. The digital cir-
cuits were clocked at 200 MHz. The LO signal was maintained at 2.410 GHz generat-
ing a 10-MHz IF signal to the FPGA ADC inputs. As a reference, Matlab-simulated
beam patterns for each transform (approximate and exact) are also plotted, taking
the element pattern into consideration. That is, the resultant beam pattern of the
ideal beam pattern resulting from the transform and the element pattern is gen-
erated. To make simulated patterns more realistic, a time domain simulation has
been conducted taking the measured element pattern of each antenna into account
70
by scaling the signal at each antenna element by using the gain corresponding to
the direction of reception. It should be noted that the plot containing Bin 4 is
only shown for completeness. The beam direction for this bin is at the end-fire (90◦
due to the λ/2 geometry chosen) which falls into the null direction of each antenna
pattern. Fig. 3.14 shows all the beam patterns in single plots. Fig. 3.14(a) shows
the observed beam patterns using the approximate transform with the use of raw
values measured at each bin output. It can be noticed that bins 1,2 and 6,7 do not
follow the element pattern due to non-uniform gains inherent in the approximate
transform. Fig. 3.14(b) shows the normalized beam patterns for the same beams in
the log domain, where each beam output has been normalized to 1 by dividing by
each beam’s maximum value. Fig. 3.14(c) shows the beam patterns observed from
the exact FFT implementation. Fig. 3.14(d) depicts the normalized beam patterns
in the log domain. It should be noted that the end-fire beam corresponding to the
beam output of bin:4 has been ignored in these plots.
16-Point Beam Measurements
The same measurement procedure was repeated using the full 16-elements of
the array along with the implemented 16-point digital cores to obtain the beam
measurements. For reference, the beams arising using the exact-FFT digital core
were also measured. It is a critical fact that the separation between transmitter
and the receiver needs to be high enough to ensure the assumption that a plane
wave is received by the array. This is important to obtain a good measurement of
the beam patterns. During broadside calibration it was observed that a significant
phase deviation existed between the signals captured at two end-fire elements. This
is due to the fact that the physical aperture size of the full 16-element array (96
cm) is comparable to the distance between the transmitter and receiver. Figs. 3.15
71
and 3.16 show the individual beam plots arising from the measured values for both
approximate and exact transforms. The transmitter and the receiver separation is
constrained to the dimensions of the anechoic chamber and this issue has affected
as an increased side-lobe level of the the measured result for both transforms. The
simulated beam patterns for each transform have been plotted taking the element
pattern into consideration as in the 8-point case. Bin 8 in Fig. 3.16 corresponds to
the beam looking at the end-fire which falls into the null direction of each antenna
pattern and is shown only for completeness.
Fig. 3.17 shows all the beam patterns in single plots. Fig. 3.17 (a) shows
the observed beam patterns using the approximate transform with the use of raw
values measured at each bin output. As for the case of the beams measured for
the 8-point approximate transform, it can be noticed that bins do not follow the
element pattern due to non-uniform gains inherent in the approximate transform.
Fig. 3.17 (b) shows the normalized beam patterns for the same beams in the log
domain, where each beam output has been normalized to 1 by dividing from its
maximum value. Fig. 3.17 (c) shows the beam patterns observed from the exact
FFT implementation. Fig. 3.17 (d) depicts the normalized beam patterns in the
log domain. It should be noted that the end-fire beam corresponding to output of
bin:8 has been ignored in these plots.
72
(a)
G
ai
n 
[d
B
]
(e)
G
ai
n 
[d
B
]
(g)
G
ai
n 
[d
B
]
(c)
G
ai
n 
[d
B
]
G
ai
n 
[d
B
]
(f)
G
ai
n 
[d
B
]
(h)
G
ai
n 
[d
B
]
(d)
G
ai
n 
[d
B
]
(b)
Angle [◦]
Angle [◦]
Angle [◦]
Angle [◦]
Angle [◦]
Angle [◦]
Angle [◦]
Angle [◦]
Figure 3.15: Measured beam patterns for bins 0-7 of 16-point approximate and exact
transforms.
73
(i)
G
ai
n 
[d
B
]
(m)
G
ai
n 
[d
B
]
(o)
G
ai
n 
[d
B
]
(k)
G
ai
n 
[d
B
]
(j)
G
ai
n 
[d
B
]
(n)
G
ai
n 
[d
B
]
(p)
G
ai
n 
[d
B
]
(l)
G
ai
n 
[d
B
]
Angle [◦]
Angle [◦]
Angle [◦]
Angle [◦]
Angle [◦]
Angle [◦]
Angle [◦]
Angle [◦]
Figure 3.16: Measured beam patterns for bins 8-15 of 16-point approximate and
exact transforms.
74
(a) (b)
(c) (d)
Angle [◦]
M
a
g
n
it
u
d
e
[d
B
]
Angle [◦]
Angle [◦]
Angle [◦]
M
a
g
n
it
u
d
e
[d
B
]
M
a
g
n
it
u
d
e
[d
B
]
M
a
g
n
it
u
d
e
[d
B
]
Figure 3.17: (a) All beam patterns drawn in one plot using the 16-point approximate
transform from the raw measured values at each bin’s output, (b) the normalized
beam patterns (log domain) for the approximate transform, (c) all beam patterns
obtained using the the exact FFT core, (d) normalized patterns of (c) in the log
domain.
It can be observed that measured beam patterns using the 8-point are much
closer to the simulated beams than that of the 16-point beam measurements. The
performance of the beams is mainly determined by the performance of the analog
front-end of the beamformer. The cable length mismatches and the phase incoher-
ence of receiver chains deteriorates the performance of the beams. Apart from that
the ideal measurement of a beam would require an ideal plain wave impinging on
the receiver array. For a lab anechoic measurement environment, having limited
separation between the transmitter and the receiver affects the beam measurement
75
as the receiver aperture size increases. This is due to the fact that the path length
to each element of the receiver array being different from the transmitter for finite
size receiver apertures (the receive aperture here is ≈ 1m and the separation of the
transmitter and the receiver is ≈ 3m). Therefore, owing to the experimental mea-
surement setting conditions ideal plain wave simulated beam performance cannot
be achieved.
3.7 Conclusion
A low-complexity 8- and 16-beam multibeam beamformers have been proposed and
implemented based on the multiplierless approximate DFT algorithms. The ADFTs
have been obtained through the solution of constrained discrete optimization prob-
lem resulting from the DFT matrix parameterization and are much lower in com-
plexity than the fixed-point FFT counter parts while maintaining the mathematical
properties of the exact DFT. The proposed DFT approximation’s for multibeam ap-
plications together with their fast algorithm enables digital realizations using only
additions and bit shifts and therefore, brings down the lower bound on the multi-
plication complexity from O(N logN) to zero.
An FPGA based digital realization of the proposed low-complexity 16-point
ADFT was obtained for 16 achieving simultaneous RF beams. These RF beams
were measured and verified using a 16-element 2.4 GHz array setup which gener-
ated the complex-valued (IQ) signals as the input to the FPGA implemented digital
cores. The FPGA realization of the digital beamformer supports 120 MHz band-
width per beam. Sampling and the digital realizations were based on the ROACH-2
platform which integrated a Xilinx Virtex-6 FPGA. The beams generated from the
outputs of the approximate DFT transforms were measured by rotating the receiver
76
antenna array and were compared against the beams generated from the fixed-point
(8-bit twiddle factors) FFT counter part digital cores. The measured beams for
both the cases (ADFT and fixed-point FFT) with comparison to the floating point
theoretical beams have been reported. It can be seen that the beams corresponding
to the approximate DFT transform are in good agreement with the beams corre-
sponding to the fixed-point FFT. The performance of the beamformer is dominated
by the post-calibration phase errors in the microwave front-ends.
The future work can be directed towards investigation of larger ADFTs and
realization at emerging mm-wave bands.
77
CHAPTER 4
LOW-COMPLEXITY 32-BEAM MULTIBEAM SYSTEM: BUILDING
BLOCK FOR A MULTIPLIERLESS 1024-BEAM DIGITAL ARRAY
This chapter extends the work in 3 to investigate a low-complexity
multiplication-free 32-point digital computing architecture that can form 32-multiple
simultaneous RF beams that would lead towards the realization of a 1024 2D low-
size-weight-and-power (SWaP) RF beams. Arithmetic complexity due to multipli-
cation is reduced from the FFT complexity of O(N logN) for DFT realizations,
down to zero, thus yielding a 46% and 55% reduction in chip area and dynamic
power consumption, respectively, for the N = 32 case. The chapter describes the
the proposed 32-point DFT approximation targeting 1024-beams using a 2D uni-
form rectangular array, and shows the multiplierless approximation and its mapping
to a 32-beam sub-system consisting of 5.8 GHz antennas that can be used for gen-
erating 1024 digital beams without multiplications. Real-time beam computation
is achieved using the ROACH-2 FPGA platform at 120 MHz bandwidth per beam.
Theoretical beam performance is compared with measured RF patterns from both
a fixed-point FFT as well as the proposed multiplier-free algorithm.
4.1 Introduction
A novel low-complexity multibeam architecture for realizing a massive number of
simultaneous sharp beams, which are vitally important in coping with the rapid in-
creases in path loss expected in future mmW/sub-THz wireless systems is proposed.
In particular, a low-SWaP approach for generating 1024 beams using a 32 × 32
aperture and ultra-low-complexity digital VLSI hardware is discussed. A 32-beam
subsystem based is proposed based on a novel 32-point DFT approximation as the
building block of such a 32×32 system. The proposed 32-beam subsystem has been
78
implemented at 5.8 GHz and the digital beams have measured and compared with
those from exact-DFT-based beams. The measured beams have been used to derive
the beam patterns of the corresponding 32 × 32 rectangular aperture by assuming
identical element patterns in all directions.
4.2 2D DFT based Multibeam Transceivers
Following the discussion from the ULA based multibeam beamforming using the
spatial DFT in Chapter 3, the discussion is extended to achieving multiple simul-
taneous beams using 2D uniform rectangular arrays in this chapter. Multibeam
on an N ×M (N, M ∈ Z+) linearly spaced rectangular array can be achieved by
uniformly sampling the spatial frequency domains to define a set of far-field plane-
waves having spatial frequencies determined by setting (ωx, ωy) =
(
2π
N
k1,
2π
M
k2
)
where
k1 = 0, 1, . . . ,M − 1 and k2 = 0, 1, . . . , N − 1. For this analysis carried out in this
chapter, the case where M = N is considered so that the same proposed N -point
transform can be used row-wise and column-wise in a rectangular aperture for gen-
erating 2D beams. The spatial frequency points
(
2π
N
k1,
2π
N
k2
)
correspond to beams
pointing at unique angle pairs indexed by (k1, k2) ∈ Z2. The corresponding spatially-
sampled time-continuous plane waves at the terminals of the array elements can be
expressed in a Fourier basis as,
E(nx, ny, t) = E0
N−1∑
k1=0
N−1∑
k2=0
xm(t)e
j(nxωx+nyωy+2πfct) (4.1)
where fc is the unmodulated carrier frequency, E0 is a constant that sets the signal
power, and xm(t) is the complex modulated information component of the signal.
It is assumed that the bandwidth of xm(t) is much smaller than fc and the analysis
is only valid for narrowband signals for which the so-called spatial-wideband effect
is negligible [121, 122].
79
LO
QI QI
(b)
QI
(a)
ADC
HT HT
HT HT
HT HT HT
HT
N-point ADFT
HT
N-
po
in
t A
D
FT
N-
po
in
t
AD
FT
N-point ADFT
Figure 4.1: (a) Digital beamforming architecture for obtaining N2 beams using an
N ×N URA. (b) Block diagram of a N -element sub-system that acts as a building
block for the N2 rectangular aperture array. The block named HT in the figure
denotes the Hilbert transform operation.
In receiver mode, the plane-waves present at the antennas are sampled in the
spatial domain, amplified and filtered, down-converted to baseband (or an IF), and
finally digitized by an ADC present at each array location. The digitized signals
at each location is complex, i.e., has I and Q components. For example, down-
conversion using a quadrature mixer (which is modeled as multiplication by e−j2πfct)
leaves the spatial frequency components intact as implied by (4.2)
EBB(nx, ny, t) = xm,k(t)e
j(nxωx+nyωy) (4.2)
80
As a result, the spatial spectrum of the wave remains localized at (ωx, ωy). The
creation of a sharp RF beam for extracting directional information for a partic-
ular plane-wave therefore involves the application of a 2D spatial bandpass filter
having the sharpest possible selectivity centered on a particular frequency pair in
the spatial frequency domain. As discussed in chapter 3 the DFT realizes a filter
bank of FIR filters with sharp bandpass responses that take the well-known sinc(ω)
response shape; the peak stopband magnitude for this shape has an asymptote of
to −13.25 dB for increasing filter order N . Therefore, to simultaneously receive an
N × N array of signals, the multibeam beamformer must compute the 2D DFT
spatially across the nx, ny dimensions of the array.
For transmit applications, the waves to be transmitted at simultaneous multiple
directions are applied to the inputs of the 2D inverse DFT (IDFT), with the corre-
sponding IDFT outputs being converted to analog using digital to analog converters
(DAC)s, filtered, up-converted to the desired carrier frequency, and amplified before
being applied to the input terminals of the transmit array.
Fig. 4.1(a) shows the digital beamforming architecture for an N × N uniform
rectangular array (URA) that generates N2 beams. The block diagram of an N -
element ULA subsystem that acts as a building block for the N2 URA is shown
in Fig. 4.1(b). It is noted that a Hilbert transform block is shown in the Fig. 4.1
to transform the real signals to complex. This is one alternative of achieving the
transformation when quadrature downconversion is not used in the analog path.
One other method to achieve the same is to use digital quadrature down conversion
directly on the low-IF input. Using analog quadrature mixing provides the luxury
of using the full ADC bandwidth in a system where as the digital down conversion
(DDC) based approaches loose half of the ADC bandwidth.
81
As it has been discussed in detail in Chapter 3, the direct computation of the
DFT of an N -point vector of input values requires a number of complex arithmetic
operation in O(N2), and it was also explained in Chapter 3 that using the symme-
tries of the N -point DFT matrix FN , it is possible to compute the matrix-vector
product X = FN ·x with order O(N logN) complex arithmetic operations using the
FFT algorithms. FFTs achieve this saving by the use of fast algorithms based on
sparse matrix factorizations. The complexity reduction from O(N2) to O(N logN)
is substantial as N grows large.
4.3 A 32-point DFT Approximation and Fast Algorithm for
RF Beamforming
4.3.1 32-point Approximate DFT
A 32-point approximate DFT matrix F̂32 is proposed for which the matrix-vector
multiplication operation in computing the RF beams can be computed without
multipliers. Let P be the set {0,±1,±2,±1/2}. Let MP2(32) be the set of 32× 32
complex matrices such that the real and the imaginary parts are defined over the
set P. The approximate transform F̂32 can be found according to a multi-criterion
optimization considering the search space represented by the parametrized mapping
below:
g : R −→ MP2(32),
β 7−→ round(β · F32)
and objective functions given by the following selected matrix-based metrics:
(i) Frobenius norm of the matrix difference, (ii) total error energy, (iii) average
82
percent absolute error, and (iv) orthogonality deviation. The optimal solution for
the above DFT approximation has been found by determination of the Pareto effi-
cient solution set, which is the set of non-dominant solutions [123] using β ∈ (0, 5]
with steps of 10−2 by UPFE collaborators [104].
The found optimal matrix resulting from the above optimization problem is given
in (4.3). Due to the large matrix size of 32 × 32, the matrix is represented as Ai,
i ∈ {0, 1, 2, 3} 16× 16 sub-matrices given by equation set (4.4) [104].
F̂32 =



A0 A1
A2 A3


 , (4.3)
Among the efficient solutions, the matrix F̂32 exhibits the smallest total error
energy of approximately 3.32 · 102. The Frobenius norm of the matrix error per
matrix element 1
322
‖F̂32 −F32‖F = 1.004 · 10−2, where ‖ · ‖F is the Frobenius norm.
This measurement is 54.9% lower than the error per element of the DFT approxi-
mation described in [105,124,125] and is regarded to be acceptable for beamforming
applications.
Fig. 4.2 shows a comparison of the frequency responses of all the bins for the
32-point proposed approximate DFT and the DFT. The shapes and locations of
the main beams are almost identical to the exact DFT. The relative errors of the
magnitude response of each filter response are largely confined to the stopbands
away from the main lobe (i.e., deep side lobes), and are generally below the −15 dB
level. Fig. 4.2(c) shows the magnitude error plot of the filter bank responses of the
proposed DFT approximation. The plot in Fig. 4.2(c) is computed by evaluating the
difference of the magnitude responses of approximate and exact DFT transforms for
each filter (i.e., DFT/ADFT bin). The plots in Fig. 4.3 show the bins in Fig. 4.2(c)
that have the highest magnitude error. All other bins have a magnitude error that
83
A0 =













1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1−1i 1−1i 1−1i −1i −1i −1i −1i −1i −1−1i −1−1i −1−1i −1 −1
1 1 1−1i −1i −1i −1i −1−1i −1 −1 −1 −1+1i 1i 1i 1i 1+1i 1
1 1−1i −1i −1i −1−1i −1 −1 −1+1i 1i 1+1i 1 1 1−1i −1i −1i −1−1i
1 1−1i −1i −1−1i −1 −1+1i 1i 1+1i 1 1−1i −1i −1−1i −1 −1+1i 1i 1+1i
1 1−1i −1i −1 −1+1i 1i 1 1−1i −1i −1−1i −1 1i 1+1i 1 −1i −1−1i
1 −1i −1−1i −1 1i 1 1−1i −1i −1 1i 1+1i 1 −1i −1 −1+1i 1i
1 −1i −1 −1+1i 1+1i 1−1i −1i −1 1i 1 −1i −1−1i −1+1i 1+1i 1 −1i
1 −1i −1 1i 1 −1i −1 1i 1 −1i −1 1i 1 −1i −1 1i
1 −1i −1 1+1i 1−1i −1−1i 1i 1 −1i −1 1i 1−1i −1−1i −1+1i 1 −1i
1 −1i −1+1i 1 −1i −1 1+1i −1i −1 1i 1−1i −1 1i 1 −1−1i 1i
1 −1−1i 1i 1 −1−1i 1i 1 −1−1i 1i 1−1i −1 1i 1−1i −1 1i 1−1i
1 −1−1i 1i 1−1i −1 1+1i −1i −1+1i 1 −1−1i 1i 1−1i −1 1+1i −1i −1+1i
1 −1−1i 1i −1i −1+1i 1 −1 1+1i −1i −1+1i 1 −1 1+1i −1i 1i 1−1i
1 −1 1+1i −1i 1i −1i −1+1i 1 −1 1 −1−1i 1i −1i 1i 1−1i −1
1 −1 1 −1−1i 1+1i −1−1i 1i −1i 1i −1i 1i 1−1i −1+1i 1−1i −1 1













,
A1 =













1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
−1 −1 −1 −1+1i −1+1i −1+1i 1i 1i 1i 1i 1i 1+1i 1+1i 1+1i 1 1
1 1 1−1i −1i −1i −1i −1−1i −1 −1 −1 −1+1i 1i 1i 1i 1+1i 1
−1 −1+1i 1i 1i 1+1i 1 1 1−1i −1i −1−1i −1 −1 −1+1i 1i 1i 1+1i
1 1−1i −1i −1−1i −1 −1+1i 1i 1+1i 1 1−1i −1i −1−1i −1 −1+1i 1i 1+1i
−1 −1+1i 1i 1 1−1i −1i −1 −1+1i 1i 1+1i 1 −1i −1−1i −1 1i 1+1i
1 −1i −1−1i −1 1i 1 1−1i −1i −1 1i 1+1i 1 −1i −1 −1+1i 1i
−1 1i 1 1−1i −1−1i −1+1i 1i 1 −1i −1 1i 1+1i 1−1i −1−1i −1 1i
1 −1i −1 1i 1 −1i −1 1i 1 −1i −1 1i 1 −1i −1 1i
−1 1i 1 −1−1i −1+1i 1+1i −1i −1 1i 1 −1i −1+1i 1+1i 1−1i −1 1i
1 −1i −1+1i 1 −1i −1 1+1i −1i −1 1i 1−1i −1 1i 1 −1−1i 1i
−1 1+1i −1i −1 1+1i −1i −1 1+1i −1i −1+1i 1 −1i −1+1i 1 −1i −1+1i
1 −1−1i 1i 1−1i −1 1+1i −1i −1+1i 1 −1−1i 1i 1−1i −1 1+1i −1i −1+1i
−1 1+1i −1i 1i 1−1i −1 1 −1−1i 1i 1−1i −1 1 −1−1i 1i −1i −1+1i
1 −1 1+1i −1i 1i −1i −1+1i 1 −1 1 −1−1i 1i −1i 1i 1−1i −1
−1 1 −1 1+1i −1−1i 1+1i −1i 1i −1i 1i −1i −1+1i 1−1i −1+1i 1 −1













,
A2 =













1 −1 1 −1 1 −1 1 −1 1 −1 1 −1 1 −1 1 −1
1 −1 1 −1+1i 1−1i −1+1i −1i 1i −1i 1i −1i 1+1i −1−1i 1+1i −1 1
1 −1 1−1i 1i −1i 1i −1−1i 1 −1 1 −1+1i −1i 1i −1i 1+1i −1
1 −1+1i −1i 1i −1−1i 1 −1 1−1i 1i −1−1i 1 −1 1−1i 1i −1i 1+1i
1 −1+1i −1i 1+1i −1 1−1i 1i −1−1i 1 −1+1i −1i 1+1i −1 1−1i 1i −1−1i
1 −1+1i −1i 1 −1+1i −1i 1 −1+1i −1i 1+1i −1 −1i 1+1i −1 −1i 1+1i
1 1i −1−1i 1 1i −1 1−1i 1i −1 −1i 1+1i −1 −1i 1 −1+1i −1i
1 1i −1 1−1i 1+1i −1+1i −1i 1 1i −1 −1i 1+1i −1+1i −1−1i 1 1i
1 1i −1 −1i 1 1i −1 −1i 1 1i −1 −1i 1 1i −1 −1i
1 1i −1 −1−1i 1−1i 1+1i 1i −1 −1i 1 1i −1+1i −1−1i 1−1i 1 1i
1 1i −1+1i −1 −1i 1 1+1i 1i −1 −1i 1−1i 1 1i −1 −1−1i −1i
1 1+1i 1i −1 −1−1i −1i 1 1+1i 1i −1+1i −1 −1i 1−1i 1 1i −1+1i
1 1+1i 1i −1+1i −1 −1−1i −1i 1−1i 1 1+1i 1i −1+1i −1 −1−1i −1i 1−1i
1 1+1i 1i 1i −1+1i −1 −1 −1−1i −1i 1−1i 1 1 1+1i 1i 1i −1+1i
1 1 1+1i 1i 1i 1i −1+1i −1 −1 −1 −1−1i −1i −1i −1i 1−1i 1
1 1 1 1+1i 1+1i 1+1i 1i 1i 1i 1i 1i −1+1i −1+1i −1+1i −1 −1













,
A3 =













1 −1 1 −1 1 −1 1 −1 1 −1 1 −1 1 −1 1 −1
−1 1 −1 1−1i −1+1i 1−1i 1i −1i 1i −1i 1i −1−1i 1+1i −1−1i 1 −1
1 −1 1−1i 1i −1i 1i −1−1i 1 −1 1 −1+1i −1i 1i −1i 1+1i −1
−1 1−1i 1i −1i 1+1i −1 1 −1+1i −1i 1+1i −1 1 −1+1i −1i 1i −1−1i
1 −1+1i −1i 1+1i −1 1−1i 1i −1−1i 1 −1+1i −1i 1+1i −1 1−1i 1i −1−1i
−1 1−1i 1i −1 1−1i 1i −1 1−1i 1i −1−1i 1 1i −1−1i 1 1i −1−1i
1 1i −1−1i 1 1i −1 1−1i 1i −1 −1i 1+1i −1 −1i 1 −1+1i −1i
−1 −1i 1 −1+1i −1−1i 1−1i 1i −1 −1i 1 1i −1−1i 1−1i 1+1i −1 −1i
1 1i −1 −1i 1 1i −1 −1i 1 1i −1 −1i 1 1i −1 −1i
−1 −1i 1 1+1i −1+1i −1−1i −1i 1 1i −1 −1i 1−1i 1+1i −1+1i −1 −1i
1 1i −1+1i −1 −1i 1 1+1i 1i −1 −1i 1−1i 1 1i −1 −1−1i −1i
−1 −1−1i −1i 1 1+1i 1i −1 −1−1i −1i 1−1i 1 1i −1+1i −1 −1i 1−1i
1 1+1i 1i −1+1i −1 −1−1i −1i 1−1i 1 1+1i 1i −1+1i −1 −1−1i −1i 1−1i
−1 −1−1i −1i −1i 1−1i 1 1 1+1i 1i −1+1i −1 −1 −1−1i −1i −1i 1−1i
1 1 1+1i 1i 1i 1i −1+1i −1 −1 −1 −1−1i −1i −1i −1i 1−1i 1
−1 −1 −1 −1−1i −1−1i −1−1i −1i −1i −1i −1i −1i 1−1i 1−1i 1−1i 1 1













.
(4.4)
is smaller than −13 dB. The deviations in the filter bank responses with respect to
the DFT filter bank responses is a fact that arises due to filter coefficients not being
ideal as they have been approximated by small integers. Thus, the performance
level is mainly set by the size of the optimization search space.
84
-3 -2 -1 0 1 2 3
-40
-30
-20
-10
0
-3 -2 -1 0 1 2 3
-40
-30
-20
-10
0
-3 -2 -1 0 1 2 3
-40
-30
-20
-10
0(a) (c)(b)
G
ai
n 
[d
B
]
G
ai
n 
[d
B
]
G
ai
n 
[d
B
]
ω ω ω
Figure 4.2: The simulated frequency responses of the 32 output bins of the (a) pro-
posed 32-point ADFT, (b) exact DFT; (c) the magnitude error of the two responses.
It is also noted that similar to the 16-beam algorithm discussed in Chapter 3, the
proposed approximation would also not directly work with conventional windowing
functions due to its numerical structure. However, these functions can be modified
to achieve the desired windowing performance.
4.3.2 Fast Algorithm for Computing the 32-point ADFT
A fast algorithm for computing the approximate transform F̂32 in (4.3) to be used
in place of usual FFTs can be derived by means of sparse matrix factorization
in a decimation-in-frequency approach [81]. The matrix transform F̂32 has been
factorized as shown in (4.5) by the collaborating Brazilian mathematicians.
F̂32 = W8 ·W7 ·W6 ·W5 ·W4 ·W3 ·W2 ·W1, (4.5)
where Wi for i ∈ {1, 2, 3, 4, 5, 6, 7, 8} are sparse matrices (factorization stages).
The non-zero matrix elements of each matrix Wi are given in Table 4.1 and 4.2.
The matrix factorization in (4.5) is not unique (i.e., can admit multiple different
factorizations) unlike factorization of a composite integer [126]. The number of
stages (i.e., sparse matrices) in the matrix factorization depend on the factorization
method employed. The number of stages is not important as long as the overall
number of elementary arithmetic operations in the factorized form is lower when
85
Figure 4.3: Bins that have the highest magnitude error in Fig. 4.2. (c).
compared to the direct non-factorized form of the matrix-vector product. Notice
that the entries of the sparse matrices Wi only contain the elements from the set
P0 = {+1,−1,+j,−j} which imply trivial arithmetic operations in digital imple-
mentations. Given the fast algorithm in (4.5), the computational complexity
associated with computing can be quantified. Let us consider the complex input
signals which correspond to inputs being the I and Q outputs of the received signal
from the array to the digital processor and evaluate the arithmetic complexity in
terms of real operations. The arithmetic cost of each matrix in each factorization
86
Table 4.1: Matrix factors W1 to W4 , represented by their non-zero indexes
Factorized
Stage
+1 −1
W1 (1,1), (1,17), (2,2), (2,16), (3,3), (3,15),
(4,4), (4,14), (5,5), (5,13), (6,6), (6,12),
(7,7), (7,11), (8,8), (8,10), (9,9), (10,8),
(11,7), (12,6), (13,5), (14,4), (15,3),
(16,2), (17,1), (18,18), (18,32), (19,19),
(19,31), (20,20), (20,30), (21,21), (21,29),
(22,22), (22,28), (23,23), (23,27), (24,24),
(24,26), (25,25), (26,24), (27,23), (28,22),
(29,21), (30,20), (31,19), (32,18)
(10,10), (11,11),
(12,12), (13,13),
(14,14), (15,15),
(16,16), (17,17),
(26,26), (27,27),
(28,28), (29,29),
(30,30), (31,31), (32,32)
W2 (1,1), (2,2), (2,18), (3,3), (3,19), (4,4),
(4,20), (5,5), (5,21), (6,6), (6,22), (7,7),
(7,23), (8,8), (8,24), (9,9), (9,25), (10,10),
(10,26), (11,11), (11,27), (12,12), (12,28),
(13,13), (13,29), (14,14), (14,30), (15,15),
(15,31), (16,16), (16,32), (17,17), (18,2),
(19,3), (20,4), (21,5), (22,6), (23,7),
(24,8), (25,9), (26,10), (27,11), (28,12),
(29,13), (30,14), (31,15), (32,16)
(18,18), (19,19),
(20,20), (21,21),
(22,22), (23,23),
(24,24), (25,25),
(26,26), (27,27),
(28,28), (29,29),
(30,30), (31,31), (32,32)
W3 (1,1), (1,9), (2,2), (2,8), (3,3), (3,7),
(4,4), (4,6), (5,5), (6,4), (7,3), (8,2), (9,1),
(10,10), (10,16), (11,11), (11,15), (12,12),
(12,14), (13,13), (14,12), (15,11), (16,10),
(17,17), (18,18), (19,19), (20,20), (21,21),
(22,22), (23,23), (24,24), (25,25), (26,26),
(27,27), (28,28), (29,29), (30,30), (31,31),
(32,32)
(6,6), (7,7), (8,8), (9,9),
(14,14), (15,15), (16,16)
W4 (1,1), (1,5), (2,2), (2,4), (3,3), (4,2), (5,1),
(6,6), (7,7), (7,9), (8,8), (9,7), (10,10),
(11,11), (11,13), (12,12), (13,11), (14,14),
(14,16), (15,15), (16,14), (17,17), (17,29),
(18,18), (19,19), (20,20), (21,21), (21,25),
(22,22), (23,23), (24,24), (25,21), (26,26),
(27,27), (28,28), (29,17), (30,30), (31,31),
(32,32)
(4,4), (5,5), (9,9),
(13,13), (16,16),
(25,25), (29,29)
stage of (4.5) is evaluated as described in [81]. Because the coefficients of the real
and imaginary parts of Wi for i ∈ {1, 2, 3, 4, 5, 6, 7, 8} are also in P0, only additions
are required. The additive cost is based on the number of nonzero elements the rows
87
Table 4.2: Matrix factors W5 to W8, represented by their non-zero indexes
+1 −1
W5 (1,1), (1,3), (2,2), (3,1), (4,4), (4,5),
(5,4), (6,6), (6,9), (7,7), (7,8), (8,7), (9,6),
(10,10), (10,13), (11,11), (11,12), (12,11),
(13,10), (14,14), (14,15), (15,14), (16,16),
(17,31), (18,18), (19,19), (19,25), (20,20),
(20,22), (20,24), (21,21), (21,23), (22,20),
(23,21), (24,20), (25,19), (26,26), (27,27),
(27,29), (28,28), (28,30), (28,32), (29,27),
(30,28), (31,17), (31,31), (32,28)
(3,3), (5,5), (8,8), (9,9),
(12,12), (13,13), (15,15),
(17,17), (22,22), (23,23),
(24,24), (25,25), (29,29),
(30,30), (32,32)
W6 (1,1), (1,2), (2,1), (3,3), (4,4), (5,5),
(6,6), (7,7), (8,8), (9,9), (10,10), (11,11),
(12,12), (13,13), (14,14), (15,15), (16,16),
(17,17), (18,18), (18,22), (19,19), (20,20),
(20,21), (21,20), (22,18), (23,23), (24,18),
(24,24), (25,25), (26,26), (26,30), (27,27),
(28,28), (28,31), (29,29), (30,26), (31,28),
(32,26), (32,32)
(2,2), (18,24), (21,21),
(22,22), (26,32), (30,30),
(31,31)
W7 (1,1), (2,2), (3,3), (4,4), (5,5), (6,6), (7,7),
(8,8), (9,9), (10,10), (11,11), (12,12),
(13,13), (14,14), (15,15), (16,16), (17,17),
(17,30), (18,18), (18,25), (19,24), (20,20),
(21,21), (22,22), (22,23), (23,22), (24,19),
(24,24), (25,18), (26,26), (26,27), (27,26),
(28,28), (29,29), (29,32), (30,17), (31,31),
(32,29)
(19,19), (23,23), (25,25),
(27,27), (30,30), (32,32)
+1 −1 +j −j
W8 (1,1), (2,28), (3,7),
(5,4), (6,26), (9,3),
(11,9), (15,8), (17,2),
(19,8), (23,9), (25,3),
(28,26), (29,4),
(31,7), (32,28)
(4,29), (7,6), (8,17),
(10,30), (12,27),
(13,5), (14,32),
(16,31), (18,31),
(20,32), (21,5),
(22,27), (24,30),
(26,17), (27,6),
(30,29)
(5,14), (13,15),
(15,12), (18,21),
(20,19), (22,25),
(23,13), (24,22),
(25,16), (26,23),
(27,10), (28,18),
(30,24), (31,11),
(32,20)
(4,24),
(12,25),
(14,19),
(16,21),
(19,12),
(21,15)
of each Wi matrix, as detailed in [81]. Therefore, the matrices W1, W2, and W5
require 60 real additions; the matrices W3, W4, and W6 require 28 real additions;
and the matrix W7 requires 24 real additions. The only complex matrix in the fac-
torization, W8, requires 60 real additions. In total, the transform F̂32 requires 348
88
Table 4.3: Comparison of arithmetic complexities for performing the 32-point DFT
using different FFT algorithms
Method
No. of
real additions
No. of
real multipliers
Radix-2 FFT [81, p. 76] 408 88
Split-Radix FFT [127] 388 68
Winograd FFT [128] 388 68
Direct Computation F̂32 584 0
Fast Algorithm F̂32 348 0
real additions. Table 4.3 shows the real multiplicative and additive costs associated
with several well-known FFT algorithms compared with the proposed algorithm.
Table 4.3 also shows the additive complexity achieved through the proposed fast
algorithm is 40% lower when compared to direct computation of F̂32.
4.3.3 Hardware Metrics of the Proposed ADFT Realization
The 32-point ADFT fast algorithm in (4.5) was realized as a digital core and synthe-
sized using 45 nm CMOS free-PDK standard cells [129]. For comparison purposes, a
32-point FFT core based on the Duhamel algorithm was also implemented in digital
and synthesized using the same technology. Both the approximate and fixed point
exact FFT digital cores assume inputs of 8-bit word length. The fixed-point exact
FFT core was designed with 10-bit twiddle factors [83] which maintains a precision
of 2−9 in the phasing coefficients. The multiplications throughout the signal paths
were handled such that they preserve at least the coefficient precision. Table 4.4
compares the following metrics for the two implementations: chip area A, critical
path delay T , maximum clock frequency Fmax, area-time AT , area-time-squared
AT 2, frequency- and voltage-normalized dynamic power consumption Dp, and max-
imum side-lobe level. It can be seen that the proposed ADFT algorithm consumes
89
46% less area than the reference fixed-point FFT design, while achieving a 50% drop
in critical path delay. It is also noted that the metrics AT and AT 2 are reduced
by 73% and 86%, respectively, where the metric AT is important when area/cost is
more important, AT 2 is critical when speed performance is crucial. Note that the
speed values mentioned in Table 4.4 are only based on synthesis results, i.e., do not
consider layout effects that slow down the performance of physical implementations.
However, such effects will be present in both designs, so the relative improvements
in AT and AT 2 metrics are expected to remain valid. The compromise is an ≈ 2dB
increase in sidelobe level, which is assumed to be tolerable in most RF beamforming
applications where unwanted signals (jammers) can fall on larger sidelobes.
Table 4.4: Comparison of ASIC realization metrics for the proposed ADFT vs a
32-point FFT (Duhamel) using a 45-nm PDK
Metric Duhamel algorithm ADFT Change
Area, A
(mm2)
0.856 0.465 46%↓
Critical path
delay, T (ns)
1.73 0.86 50%↓
Frequency,
Fmax (GHz)
0.58 1.16 100%↑
AT (mm2 ·ns) 1.481 0.400 73%↓
AT2 (mm2 ·
ns2)
2.562 0.344 86%↓
Dynamic
Power,
Dp (mW/GHz)
1303 580 55%↓
Largest
side-lobe
level (dB)
−13.26 −11.03 2.23↑
**The proposed algorithm achieves ≈ 50% reduction in area and time at the expense of
≈ 2dB increase in side lobe levels.
90
4.3.4 N-Beam Beamforming Architectures for ULAs and
URAs
Fig. 4.1 shows the top-level hardware architectures for realizing N and N2 simulta-
neous orthogonal beams for an N -element and N×N aperture respectively using an
N -point spatial DFT digital core as the basic signal processing block. The front-end
is shown as a direct-conversion receiver chain followed by analog-to-digital conver-
sion for digital beamforming. The digitized data can be converted to complex (I-Q)
form using a Hilbert transform. This can also be achieved by using a quadrature
mixer in the RF chain, with the luxury of going to baseband directly at the cost of
double the amount of ADCs.
The numerically simulated array factors resulting from a 32-element spatially
Nyquist sampled ULA are given in Fig. 4.4(i). Fig. 4.4(i-b) shows the beams gen-
erated using the proposed 32-point ADFT algorithm and Fig. 4.4(i-a) shows the
corresponding beams of the exact algorthm with (i-c) showing the error magni-
tude between them. Fig. 4.4(ii) shows three simulated example beams out of the
1024 beams generated by the proposed ADFT algorithm when it is applied to a
32× 32-element URA. The first and second columns of Fig. 4.4(ii) show the beams
corresponding to the exact and approximate DFT, respectively; the third and fourth
columns of Fig. 4.4(ii) show the errors between the two algorithms in the elevation
and azimuthal planes, respectively, which are small enough to be ignored for most
microwave and mm-wave beamforming applications.
91
(a)
(b)
(c)
−20 dB
−10 dB
0 dB
−20 dB
−10 dB
0 dB
(f) (g)(ii)
−20 dB
−10 dB
0 dB
(b) (c)(i) (a)
(d) (e)
90◦−90◦90◦−90◦ 90◦−90◦
0◦0◦ 0
◦
Figure 4.4: (i) Simulated polar patterns of the 32-beams for a ULA with λ/2 element
spacing. (i-a) Beams corresponding to the ADFT, (i-b) beams obtained with the
ideal FFT, and (i-c) the magnitude error between the ADFT and the exact FFT. (ii)
Example simulated beam patterns from a Nyquist-spaced URA; (a) ψ = 8.0◦, φ =
−153.4◦, (b) ψ = 45.4◦, φ = −142.1◦,(c) ψ = 26.2◦, φ = 45.0◦ (the plots are color-
coded on a dB scale).
4.4 A 32-Beam ULA-based Multibeam Beamformer
System architecture used for verifying the proposed low-complexity multibeam
beamforming algorithm is shown in Fig. 4.5(a). This section explains the system
design.
92
BPF
Energy
calculator
Energy
calculator
I Q I Q I Q
LNA
FPGA
II Q Q
LPF
Amplifier
ROACH−2
(a)
Mixer
LPF
Amplifer
Mixer
BPF
LNA
(b)
Power PC
32−point a−DFT
HT
Calib
HT
Calib
HT
Calib
ADC ADCADC
∆x
flo flo flo
Figure 4.5: (a) Overall architecture of the test setup; (b) 5.8 GHz 32-beam array
receiver setup.
4.4.1 5.8 GHz Front-End Design
The RF front-end of the receive-mode beamformer is constructed by integrating a
32-element ULA at 5.8 GHz with 32 direct conversion RF receiver chains (on PCB)
as shown in Fig. 4.5(b). The inter-element spacing of the array ∆x was set to 0.6λ,
which is ≈ 31 mm at 5.8 GHz. The specifics of the antenna design can be found
in [130].
Each antenna element of the ULA was designed as a 4× 1 vertical sub-array of
patch antennas that employs passive beamforming at RF in the orthogonal (vertical)
plane. This design improves the gain in the vertical plane, thus simplifying array
factor measurements in the azimuthal plane. The sub-array is designed by feeding
antenna elements in series along a uniform transmission line, and performing a para-
metric sweep to provide better impedance matching and performance [131]. Note
that such analog beamforming does not affect the performance of the beamforming
93
algorithm under consideration as it happens in the azimuthal plane. The antenna
outputs are directly fed into 32 heterodyne receivers designed on FR-4 PCBs using
surface mount devices. The LO signals for each receiver are provided through a cen-
tralized LO scheme that consists of a 32-output power divider network connected
to a low-phase-noise oscillator. The first stage of each receiver consists of a low-
noise amplifier (LNA) that provides 16 dB gain at 5.8 GHz with a noise figure of
2.4 dB. The amplified signal is band-pass filtered within the frequency range 4.7-6
GHz, which helps to reject out-of-band interference and noise. The band-limited
amplified signal is then passed through a mixer and low-pass filter to produce a
downconverted low-IF input. The 32 downconverted low-IF signals are further am-
plified by ∼30 dB and then digitized in parallel using two ADC16x250-8 ADC cards
(16 single-ended input channels, 8-bit, up to 250 MS/s per channel) [132]. The
in-band gain and noise figure of the entire receiver are estimated to be 38.6 dB and
2.9 dB, respectively; the latter is dominated by the LNA.
A sample clock of 200 MHz was used for clocking the ADCs for real-time hard-
ware experiments. The same clock was also routed to drive the digital circuits
implemented on the FPGAs. The digital circuits were pipelined to support clocking
at the ADC sample rate.
4.4.2 Digital Back-End
Digital processing and the beamforming were performed using the ROACH-2 plat-
form [117]; which is the same board used for verifying the 16-beam algorithm in
Chapter 3. The ADC16x250-8 ADC cards [132] that were also used for the work in
Chapter 3 were used for sampling the 32 channels in to digital. Each ADC16x250-8
94
ADC card supports 16 channels at a max rate of 240 MSps and the ROACH-2 can
support 2 such cards.
The overall architecture of the digital beamforming test-setup is shown in
Fig. 4.5(a). The digital design consists of four main subsystems: (i) a digital cal-
ibration stage; (ii) an IQ decomposition FIR filter that implements the Hilbert
transform [103]; (iii) the 32-point DFT/ADFT algorithm implementation; and (iv)
an energy calculation subsystem for facilitating real-time measurements on each
output beam. The exact DFT core was designed using 10-bit precision twiddle fac-
tors which provide a good compromise between circuit size and maximum operating
frequency.
4.5 Experimental Results
This section describes experimental results obtained from the 32-element ULA, in-
cluding antenna characterization and beam measurements.
4.5.1 Antenna Array Characterization
The performance of the array was characterized using S-parameters [133], that were
measured using a vector network analyzer. The return loss |S11| of a single patch
sub-array was measured as −20.6 dB at 5.9 GHz which was the best resonating
point of the antenna.
As described in Chapter 2 mutual coupling is another important factor that
has to be considered during the design of an antenna array processor. The mu-
tual coupling (MC) can be characterized using S-parameters for the array [44,134].
The parameter, Sn,m indicates the coupling between the mth and the nth antenna
which characterizes the coupling at nth antenna when the mth antenna is exited.
95
For a general characterization, S14,16, S15,16, S17,16 and S18,16 measurements were
taken from the 32-element array exiting the 16th antenna while the other antenna
ports were terminated with 50Ohm loads. The above mentioned mutual coupling
coefficients at 5.8 GHz were recorded as, |S14,16| = −39.2 dB, |S15,16| = −33.2 dB,
|S17,16| = −33.0 dB, and |S18,16| = −37.3 dB. As expected, mutual coupling de-
creases with inter-element separation. These mutual coupling parameters have to
be ideally considered in the beamforming algorithm to nullify the effects of coupling
in the array. Such task needs a separate study and therefore is not investigated
in this work. Due to the affect of MC the measured beams will deviate from the
expected theoretical behavior.
4.5.2 Calibration
Calibration of the RF array system is vital for obtaining optimal beamforming per-
formance. Calibration was performed in two stages. The first stage was performed
on the ADCs, and used open source routines that have already been developed for
the same hardware by members of the CASPER group [118]. The second stage
focused on digitally removing the effects of mismatches in the microwave front-end.
Relative gain and phase mismatches of the IF outputs for each chain were calculated
with respect to a reference chain using a input reference carrier at 5.86 GHz. Since
the overall system is narrowband, the recorded gain and phase values were directly
used to equalize the gain and phase of the sampled IF inputs. This was achieved by
adding a complex multiplier after the digital Hilbert transform in each channel.
96
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-6
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-7
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-2
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-3
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-0
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-1
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-4
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-5
Figure 4.6: The beam outputs at bins 0-7.
4.5.3 Beam Measurements
As shown in Fig. 4.5(b), the entire 5.8 GHz 32-element digital array placed in an
anechoic chamber for measuring the received beam patterns ( [135] shows a short
97
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-8
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-9
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-10
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-11
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-12
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-13
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-14
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-15
Figure 4.7: The beam outputs at bins 8-15.
realtime demo of the total system). Power patterns were measured by sending
a continuous-wave (CW) signal at fRF = 5.86 GHz. The LO signal frequency
fLO determines the IF fRF − fLO. The measurements were generated by setting
98
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-16
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-17
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-18
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-19
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-20
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-21
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-22
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-23
Figure 4.8: The beam outputs at bins 16-23.
fLO = 5.85 GHz, thus resulting in an IF of 10 MHz, and digitizing the down-
converted outputs at fclk = 200 MHz. The measurement was conducted using
digital integrators at each FFT/ADFT bin output to calculate the received energy
99
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-24
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-25
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-26
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-27
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-28
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-29
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-30
-50 0 50
-30
-20
-10
0
sim EDFT
sim ADFT
mes EDFT
mes ADFT
Bin-31
Figure 4.9: The beam outputs at bins 24-31.
for a fixed amount of time. Figs. 4.6,4.7,4.8,4.9 show the measured beams from
the 5.8 GHz real-time experimental setup for both the exact fixed-point FFT and
approximate algorithms along with the corresponding simulated curves. Simulated
100
beams accounts antenna element pattern and the actual separation of the transmit-
ter and the receiver in the measurement setup. The vertical axis of the plots is in
dB and the horizontal axis is the azimuthal angle for the range [−72◦, 72◦].
The measured array factor of the beams highly depends on the measurement
setup geometry. Ideally, the transmitter and the receiver should be placed far enough
apart for waves incident on the receiver array to be approximated as plane waves.
Numerical simulation in Fig. 4.10 shows how the actual array factor being measured
deviates from this ideal depending on the geometry of the test setup. Based on the
standard rules [33, p. 42], the transmitter and receiver should have a separation
exceeding 20 m at 5.8 GHz in order for the receiver aperture to be in the far field.
However, such a large separation was not achievable within our test facility. In
particular, the beams were measured in an open parking deck with a transmitter
receiver separation of approximately 7 m. Due to this reason, the measured beams
shown in Figs. 4.6 to 4.9 have been compared with numerically-simulated beams
that account for both finite transmitter-receiver separation and the actual element
pattern.
Beam plots in Figs. 4.6,4.7,4.8,4.9 show that the measured beam patterns for
both the algorithms closely follow each other for all the bins. The measured beams
also follow the expected patterns quite well in the vicinity of the main beam. For
both algorithms, the measured plots have higher side-lobe levels in the deeper stop
bands compared to the simulated ones. The degradation in stop band performance
can be due to mutual coupling effects and post-calibration errors of the system;
which are dominated by the performance of the analog front-ends in the receiver.
Measurement errors, including the fact that the tests were not performed in an
anechoic environment, also lead to deviations from the expected patterns.
101
Figure 4.10: The impact of the measurement setup geometry on the measured beam
response.
The 2D array factor of each beam arising from the proposed linear transform
can be expressed as,
Ψk,l(ωx, ωy) =
31∑
n=0
31∑
m=0
[
F̂32(k,m)F̂32(l, n)e
−j(ωx∆xm+ωy∆yn)
]
, (4.6)
which may be rearranged to
Ψk,l(ωx, ωy) =
(
31∑
m=0
F̂32(k,m)e
−jωx∆xm
)
×
(
31∑
n=0
F̂32(l, n)e
−jωy∆yn
)
(4.7)
= Υ(k, ωx)×
31∑
n=0
F̂32(l, n)e
−jωy∆yn, (4.8)
where k, l ∈ [0, 1, . . . , 31], ωx = ωct sinψ cosφ, ωy = ωct sinψ cos φ, ψ and φ are
elevation and azimuthal angles, respectively. ∆x and ∆y denote the inter-element
spacing in x and y directions. The relationship in (4.8) can be used to compute
the 2D beam responses corresponding to a 2D URA consisting of 32 linear arrays,
each with the measured responses shown in Figs. 4.6 to 4.9. In particular, the term
Υ(k, ωx) denotes the array factor of the kth beam in the 32-element linear array
subsystem. The measured 1-D beam patterns were thus used in place of Υ(k, ωx) to
102
(c)(b)(a)
Figure 4.11: 2D beam patterns computed from 1-D array beam measurements using
the ADFT algorithm. The beams correspond to the bin outputs (same angles) as
the beams shown in Fig. 4.4(a-c).
synthesize the corresponding 2D beam patterns from a 2D aperture. Fig. 4.11 shows
the 2D beam patterns obtained using the measured ULA beam measurements for
the same beams shown in Fig. 4.4 assuming ∆x = ∆y.
4.6 Conclusion
A large number of simultaneous beams has become an essential requirement for
emerging mm-wave based 5G systems. Moreover, future communications applica-
tions, such as space-based Internet services, demand an ultra-high number of beams.
An N ×N square antenna array aperture can generate up to N2 orthogonal simul-
taneous beams by using the 2D N -point spatial DFT. The upper bound of the
multiplicative complexity associated with such processing using FFT algorithms is
O(2N2 logN). The work presented in this chapter presents a low-complexity dig-
ital beamforming architecture for generating 1024 simultaneous RF beams using a
32-point DFT approximation that completely eliminates multiplication operations.
The proposed ADFT algorithm consumes 46% less area than the reference FFT-
based design, while achieving a 50% drop in critical path delay. The VLSI metrics
AT and AT 2 for the proposed algorithm are reduced by 73% and 86%, respectively.
103
The proposed approach has been validated on a fully-functional 32-element digital
1D receive array that operates at 5.8 GHz. This design will serve as the main sub-
system for future implementations of a 32× 32 2D rectangular aperture that could
generate 1024 simultaneous RF beams with significantly lower SWaP in VLSI imple-
mentations. The 1-D array uses 32 parallel ADCs for sampling the antenna outputs
and the ADFT (implemented on a Xilinx FPGA) for computing 32 RF beams in
real-time. The measured RF beams show a per-beam bandwidth of 100 MHz when
all 32 beams are realized in real time, with only marginal (< 2 dB) degradation in
beam performance compared to a control experiment based on the Duhamel FFT
core.
104
CHAPTER 5
DIGITAL BEAMFORMING AT 28 GHZ USING XILINX RFSOC
PLATFORM
This chapter presents the digital beamforming work done at 28 GHz using a
4-element array receiver. The work can be viewed as an attempt to push forward
the work done in chapters 3 and 4 and demonstrate fully digital beamforming in the
mmW frequencies supporting the wider bandwidths. Due to the constraints of cost
in mmW front-ends, the 28 GHz prototype digital beamformer has been limited to
a 4-element array. The Federal Communications Commission (FCC) in the United
States has allocated the 27.5-28.35 GHz [136] band for 5G mobile communication.
There is a total bandwidth allocation of 850 MHz in the 28 GHz band. Therefore,
the objective of the work is to perform and demonstrate fully digital beamforming
across the whole bandwidth which has not been reported in the literature. Digital
beamforming is performed using the Xilinx ZCU 1275 board which features the
Xilinx RFSoC device that comes with RF ADCs (and DACs) integrated to the
same chip along with the programmable logic and the processor subsystem. The
results of the beams measured across 800 MHz bandwidth is presented.
5.1 28 GHz Receiver Array
The digital beamforming work described in this chapter uses a 4-element 28 GHz
array receiver and the design procedure of the mmW front-end is described in [137].
The overall architecture of the 28 GHz array receiver setup is shown in Fig. 5.1.
The receiver is designed as a direct-conversion receiver that brings the RF passband
down to baseband. The specification and the design procedure of the receiver is
detailed in [137]. The receiver has been designed to support 845 MHz bandwidth
using orthogonal frequency division modulation (OFDM) that incorporates an FFT
105
F
P
G
A
I Q I Q
BPF
R
F
R
e
c
e
iv
e
r
c
h
a
in
D
ig
ita
l
H
a
rd
w
a
re
LNA
IQ mixing stage
LPF
I Q I Q
ADC ADC ADC ADC
IF amplifier
DFT based multibeam beamforming
28 GHz antenna array
MAC layer processing/digital demodulation
Xilinx ZCU1275 platform
90
0
90
0
LO LO
Figure 5.1: The overall architecture of the 4-element 28 GHz digital beamforming
receiver.
size size of 512. Following subsections will describe each of the sub-components of
the receiver array front-end.
5.1.1 Antenna Array
A 4-element ULA in receive mode where each antenna element of this array operates
at a center frequency of 28 GHz, with a bandwidth of 850 MHz was used in the array
receiver setup. This antenna was designed by a fellow labmate and the design details
are provided in [138]. The inter-element spacing of the 4-element array that was used
for the digital beamforming setup has been set to 0.75λ at 28 GHz. The antenna
array is shown in Fig. 5.2. As seen in Fig. 5.2, the individual antenna has been build
as an 8-element series fed sub-array. The vertical subarray provides higher gain in
106
Figure 5.2: The 4-element ULA consisting of series fed sub-array patches at 28 GHz.
the elevation plane to aid the real-time beam measurements. The antenna resonates
at 28.05 GHz with a return loss of −27.41 dB. The tapered 28 GHz array produced
a side-lobe level lower than −18 dB in the elevation plane as compared to -13 dB
down side-lobes for non-tapered rectangular structure.
5.1.2 Receivers and Front-End
The direct-conversion receiver chains were designed using COTS electronics follow-
ing the specifications as described in [138]. The RF receiver chain components
and their specifications are tabulated in Table 5.1. The down converter module
Table 5.1: RF receiver chain components and specifications
Component Gain NF OIP3
LNA [MACOM] MAAL-01111 19 2.5 20
Down Converter [AD] HMC1065LP4E 9 3 14
LPF [MiniCircuits] LFCN-900+ -1 1 NA
IF Amp [MiniCircuits] RAM-8A+ [31.5,24] 2.6 24.4
VGA [AD] ADL5331) [-15,15] 9 39
HMC1065LP4E incorporates a frequency doubler in it and therefore, the LO input
can be driven at half the frequency of the RF band. An LO frequency of 13.95 GHz
107
Figure 5.3: The 4-element receiver array front-end having the antenna array and
the receivers.
has been used for this beamforming setup. A centralized LO distribution network
is used to simultaneously and coherently drive all the channels from the array. As
per the calculation shown in [138], the receiver is designed to have a cascaded gain
Gcas,dB = 70 dB with an overall noise figure Fcas,dB = 2.5 dB. The receiver is capa-
ble of demodulating up to 64 quadrature amplitude modulation (QAM) at an apt
power of -80 dBm at the antenna input [137]. The entire front-end incorporating
the antenna array is shown in Fig. 5.3.
5.2 Digital Back-End
The digital back-end of the system is implemented using Xilinx ZCU 1275 [139]
development platform Rwhich incorporates an Zynq UltraScale+ RFSoC [140]. In
addition to a high-capacity FPGA fabric, the Xilinx RFSoC XCZU29DR chip in the
ZCU 1275 board contains 16, 2 GSps ADCs having a resolution of 12-bits and 16,
6.4 GSps DACs of 14-bit resolution, all integrated into a single integrated circuit.
The digital beamforming system uses 8 ADC channels in the RFSoC to sample
108
HW−CLK−102 boardXilinx ZCU1275 
Custom made baluns for interfacing ADCs
TCM2−33WX+
Minicircuits
(a) (b)
(c)
Figure 5.4: (a) Xilinx ZCU 1275 platform incorporating Zynq UltraScale+
XCZU29DR chip; (b) custom-made balun boards for interfacing the ADCs; (c)
HW-CLK-102 clock generation board.
the 4 IQ IF signal pairs from the 28 GHz antenna front-end to perform digital
beamforming. Fig. 5.4(a) shows a picture of the ZCU 1275 board that is used in
the described beamforming setup. The following section will provide a description
of the architecture of the RFSoC data-converters in general and the setting up of
the board for using data-converters.
5.2.1 RF ADCs on the Xilinx RFSoC
The architecture of the Xilinx XCZU29DR RFSoC’s ADCs is shown in Fig. 5.5 [3].
The 16 ADCs in the XCZU29DR are arranged in 4-tiles where each tile contains 4
data converters. Each tile can be clocked separately with either an external clock
input or using the phased-locked-loop (PLL) that is inbuilt into each tile. All tiles
can be synchronized using a SYSREF input that has been routed to all the tiles. The
synchronization mechanism is detailed in [3]. The ZCU1275 comes with separate
clocking boards as a solution to provide phased-locked clocking to data converters.
Fig. 5.4(c) shows the HW-CLK-102 clock board that is shipped with the ZCU 1275
109
Figure 5.5: RF ADC tile overview (figure is taken from [3]).
platform. The board consists of 2 PLLs namely PLL-A and PLL-B where PLL-A
is able to provide four phase aligned RF clocks in differential form for clocking the
RF-ADCs and DACs in the RFSoC. PLL-B is able to generate 2 pairs of differential
clocks. The particular board also supports three additional phase-aligned reference
clocks for synchronization.
The data converters of the RFSoC can be configured to either AC- or DC-coupled
modes. The AC-coupled mode requires differential inputs for each channel. Since
the outputs from the 4-element array receiver was single-ended, a custom set of
balun boards were built in-house. Fig. 5.4(b) shows a set of such baluns that were
built. The balun uses the MiniCircuits TCM233WX+ transformer chip [141] with
two AC-coupling capacitors at the differential output. The balun nominally works
110
F
IF
O
F
IF
O
F
IF
O
F
IF
O
4−point FFT
I
1
Q
1
Q
4
I
4
1966.08MSps
245.76MSps
B
e
a
m
-1
B
e
a
m
-4
p
h
a
s
e
-8
p
h
a
s
e
-2
p
h
a
s
e
-1
Used for array factor measurements
β1 β4
α4α1
integrator
Digital Digital
integrator
Figure 5.6: The overview architecture of the digital back-end.
in the range of 30 to 3000 MHz. For the system considered here, 8 baluns were
built, tested and used for sampling all 8-IQ channels to the RFSoC.
5.2.2 Digital Beamforming
All IF inputs are connected to the data converters of the Xilinx RFSoC which
are configured to sample at 1966.08 MHz. To handle the total bandwidth of 845
MHz supported by the receivers, the digital circuits were designed in a polyphase
architecture that will have 8-parallel beamforming cores. Therefore, the inbuilt first-
in first-out (FIFO)s of the Xilinx Data Converter IP (XDCIP) core was configured
to output a sample rate of 1966.08/8 = 245.76 MSps rate with 8 sampled words
per clock edge streamed into the beamforming cores. The overview architecture of
the digital back-end is shown in Fig. 5.6. The outputs of each FIFO stream were
111
Baluns
Transmitter28 GHz Array
Receiver
Xilinx ZCU1275 1 m
Figure 5.7: The entire measurement setup comprising the 28 GHz receiver array
and the 28 GHz transmitter.
synchronized to a single reference clock of 245.76 MHz that was derived from the
analog sampling clock. Four simultaneous beams were realized by using a 4-point
FFT digital core across the spatial samples realizing the 4-simultaneous beams. As
shown in the Fig. 5.6, the digital design uses 8 parallel such digital cores to process
the entire sampled bandwidth.
5.3 Real-Time Beam Measurements Setup
The full transmitter receiver setup for real-time beam measurement is shown in
Fig. 5.7. A horn-antenna is used as the transmitter to send test signals centered at
27.9 GHz f emulating different RF bandwidths. The frequency of the LO input to
the receivers were set at 13.95 GHz so that after the frequency doublers, the LO
signal is set at 27.9 GHz to bring down the RF center frequency to baseband zero.
The beamformed outputs of each beam of each phase is then used to compute the
received energy for each direction of arrival using digital integrators.
112
(b)(a)
Figure 5.8: (a) The IQ outputs of the 4 downconverted channels at an IF frequency
of 100 MHz.
5.3.1 Calibration
The calibration mode of each ADC channel was set to “Mode-2” [3] in the XDCIP.
The ADC calibration is handled by the start-up finite-state-machine inbuilt in the
XDCIP. Calibration of the RF front-ends were performed digitally by imposing
gain and phase correction. The gain phase mismatches were pre-measured using a
reference signal with respect to one receiver chain. This was done by employing
a complex multiplier at each phase of each channel. αi + jβi is a constant for
i = {0, 1, 2, 3} where {αi, βi} ∈ R represents the complex coefficient that calibrates
each receiver channel. Fig. 5.8(a) shows samples recorded at each of the 8 channels
from the array before calibration where fLO = 28 GHz, fRF = 28.1 GHz. Fig. 5.8(b)
shows the samples recorded for the same configuration after calibration.
5.3.2 Real-Time Beam Measurements
The receive-mode beams generated from the digital beamforming was measured
in a similar procedure as described in Chapter 3. The angle of arrival of the 28
GHz wave front was changed by rotating the receiver array to obtain the array
113
factors of each beam. Array factor measurements were conducted at different mmW
frequencies around 27.9 GHz to demonstrate beamforming in the full 800 MHz
bandwidth supported by the receiver. The beam patterns were measured changing
the fRF = 28 GHz the values in the frequency ranging from 27.5 GHz to 28.3 GHz,
across a 800 MHz bandwidth at steps of 100 MHz.
The simulated (solid) and measured (dashed) set of beams measured at fRF = 28
GHz is shown in Fig. 5.9. The simulated beams shown here have been generated by
taking the element pattern into effect. Fig. 5.9(a-d) corresponds to Bins 1-4 of the
spatial FFT digital core and each bin correspond to different look-directions. The
look directions are set by the antenna spacing and can be calculated using (3.2).
Since the 28 GHz array is spaced at 0.75λ, the Bin 1 points to the boresight where as
Bin-2 and Bin-4 points to ±19.5◦ respectively. Since the antenna elements are not
critically spaced, grating lobes occur beam directions beyond ±41.8◦ in this array.
The beam output from Bin 3 points ±41.8◦ and thus the plots in Fig. 5.9(d) indicates
two main lobes. Fig. 5.9(e,f) show all simulated beams and all the measured beams
(fRF = 28) in a single plot, respectively. From Fig. 5.9 it is evident that the
measured beams agree quite well with the simulated beam patterns at the emulated
IF frequency of 100 MHz.
Fig. 5.10 shows the measured patterns for output bin for different fRF frequencies
across the bandwidth [27.5, 27.3] GHz. The plots confirm that the main beam
direction remain constant across the bandwidth. The sidelobe levels changes across
the bandwidth of interest and the measurement indicates that the sidelobe levels are
elevated towards the band edges. This is a consequence of the calibration being not
wideband. The maximum sidelobe level recorded for Bins 1, 2, 4 across the whole
bandwidth is ≈ 9dB.
114
(a) (b)
(c) (d)
(e) (f)
All beams − measuredAll beams − simulated
Figure 5.9: (a-d) Simulated and measured beam corresponding to each output of
the FFT at fIF = 100 MHz. (e) All simulated beams in a single plot. (f) All the
measured beams in a single plot.
5.4 Conclusion
A 4-element digital beamforming array receiver at 28 GHz has been design and built
using Xilinx RFSoC based ZCU1275 platform as the digital back-end. The use of
the Xilinx RFSoC platform provides the luxury of processing the entire bandwidth
of the 28 GHz band FCC allocation by using the on chip data converters that can
115
(b)
(c) (d)
(a)
Figure 5.10: (a-d) Measured beams corresponding to the outputs of the FFT bins
at fRF ∈ {27.5, 27.7, 28, 28.2, 28.3} GHz.
support huge bandwidths. Two ADC tiles in the RFSoC have been used to sample
all 8 IQ channels in to digital to perform a 4-point spatial-FFT based multibeam
beamforming to produce 4 simultaneous mmW beams. Digital polyphase beam-
forming architectures have been used to generate the beams in digital across the full
baseband bandwidth. Array factors of the digitally formed multibeams have been
measured in real-time and have been presented. They are in good agreement with
the simulated beams.
116
CHAPTER 6
ANALOG 16-BEAM BEAMFORMING ARCHITECTURE AND THE
CMOS CIRCUITS USING APPROXIMATE DFT
This chapter explores a baseband multibeam beamforming method based on
the spatial Fourier transform. Approximate computing techniques that have been
introduced in chapters 3, 4 are used to propose a low-complexity, low-power, high
bandwidth beamforming architecture by exploiting the sparse factorizations of the
ADFT algorithms that were proposed for multibeam beamforming in chapters 3 and
4. The sparse factorization stages having small integer coefficients can be neatly
mapped to integer W/L ratios in CMOS current mirrors. The approximate fast
Fourier transform can thus be efficiently realized using CMOS analog integrated
circuits to generate multiple, parallel mmW beams in both transmit and receive
modes. The work in this chapter uses the 16-point approximate-FFT algorithm
given in Chapter 3 to design a novel analog multibeam network using 65 nm CMOS
models. The proposed multi-beam architectures have the potential to reduce circuit
area and power requirements while exceeding the baseband bandwidth requirements
of emerging 5G commercial systems.
6.1 Introduction and Review of RF System Considerations
As described in Chapter 1 multibeam systems are essential and are envisioned for 5G
wireless network base stations, mobile stations, micro base stations, pico cells, and
user equipment. For mobile 5G systems, compact and energy-efficient integrated
multi-beam solutions are highly desirable to improve battery life and reduce heat-
dissipation problems while enabling directional agility. The wide bandwidths of 5G
systems have created new challenges in realizing fully-integrated transceivers and
117
have made some currently-favored transceiver architectures, such as passive mixer-
based receivers, not directly applicable [142].
As an example of such a solution, a recent work by IBM [4] demonstrates fully
integrated dual-polarization 16-element arrays for 28-GHz 5G applications. The
transceiver architecture presented in [4] is a 2-step sliding-IF half-duplex architec-
ture as shown in Fig. 6.1. A transceiver topology that is similar to in [4] is very
delay
Front end (FE)
FE
FE
FE
LNA
PA
LO LO/2
I/O
Conceptual representation
of receiver and transmitter
paths
delay
Antenna #1
Antenna #2
Antenna #N
Figure 6.1: A representative 28-GHz 5G half-duplex transceiver architecture based
on delay-and-sum beamforming [4].
representative, and it is anticipated that this type of architecture will likely remain
favorite for 28 GHz 5G systems for a few years to come until 5G systems are widely
adopted and more advanced transceiver topologies are developed. Unlike steerable
beamformers using tunable delays at mmW bands, the proposed multi-beam archi-
tecture in this chapter operates in the baseband while supporting up to 1.5 GHz
of bandwidth per beam. Similar to the digital approach present in Chapter 3, the
proposed method utilizes the spatial frequency distribution of directed mmW energy
where each mmW beam has a unique spatial frequency that remains intact through
up-down conversion in the mixers. Thus, the proposed baseband analog-FFT can
118
realize multiple far-field mmW beams for variety of transceiver architectures without
requiring tunable mmW delay lines.
6.2 Spatial FFT-based Multi-Beam Architectures
In Chapter 3, the use of DFT operation digitally across an ULA samples for ob-
taining multiple simultaneous beams was discussed. Also, the Chapter 3 showed
that any type of FFT implementation has computational error due to the use of fi-
nite precision arithmetic to realize irrational DFT matrix coefficients. Therefore, it
was proposed that approximated DFTs can be used for applications like generating
multibeams as the beamforming is an application that can tolerate accuracy errors.
Having this said, computations in the analog domain are known to handle much
higher bandwidths than the digital implementations at much lower power consump-
tions. Therefore, in this chapter, the use of ADFTs that were proposed to achieve
low-complexity digital multibeams are investigated to produce analog multibeams
owing to the small integer coefficients in them.
In particular, transforms with small integer coefficients are desirable because
they enable straightforward current-mode implementations by changing the W/L
(width/length) ratios of CMOS transistors to implement multiplication by the par-
ticular coefficient. Such current-mode implementations are intrinsically fast because
their bandwidth is only limited by the poles of the current mirrors and not by the
maximum clock rate as for digital counterparts [143].
In current-mode, current-mirror-based IC implementations of the approximate
algorithms would need only 2-4 well-matched transistors for realizing each coeffi-
cient. The area and power consumption of both these signal processing approaches
(digital and analog) scale with accuracy requirements [144]. Fig. 6.2 shows system
119
(a) (b)
V/I V/I
approximate−DFT approximate−DFT
DACDAC
y1 yNy2
y1 y2 yN
I Q
R
F
fr
o
n
t-
e
n
d
R
F
R
F
fr
o
n
t-
e
n
d
fr
o
n
t-
e
n
d
Multi beam outputs
Beam select
Digital processing
R
F
fr
o
n
t-
e
n
d
R
F
R
F
fr
o
n
t-
e
n
d
fr
o
n
t-
e
n
d
Digital processing
Input select and zero conditions
QI
0
90
0
90
Analog N-point Analog N-point
Figure 6.2: (a) Receive-mode multi-beam system with down-converted analog base-
band beamforming; (b) transmit-mode multi-beam system using baseband analog
beamforming.
configurations for both transmit and receive modes that can produce N simulta-
neous beams by using the proposed analog N -point approximate-DFT (ADFT),
where N ∈ {8, 16, 32}. This chapter specifically focuses on the 16-point ADFT in
(3.9) and its factorization algorithm in (3.14) to explore CMOS circuits in order
to generate 16 high bandwidth analog beams. The discussed methodology can be
adopted for other sizes of N , given similar small integer coefficient transforms that
approximates the respective N -point transform that found through a similar ap-
proach in (3.14). The method in (3.8) can also be applied for larger matrix sizes
such as N = 1024, 2048, . . . . The resulting approximation matrices will then
possess better properties such as near-orthogonality and spectral behavior that is
close to the exact DFT. The main problem in deriving such larger approximate
DFT matrices is the necessity of creating associated fast algorithms: the larger the
matrix, the harder it becomes to derive such algorithms. A possible alternative for
generating such large approximations is the use of scaling methods. Essentially,
larger ADFTs can be generated by re-using smaller ADFTs. This avoids the hassle
of deriving fast algorithms for large matrices, and the CMOS circuit architecture
120
PMOSNMOS
VDD
W3
L3
W2
L2
W1
L1
W4
L4
Ibias + ii(t)
Iout
NMOS
PMOS
NMOS
βxj
αxi
NMOS
PMOS
NMOS
βxj
αxi
xi
(a)
xi
(b)
xj
xj
α
x
i
+
β
x
j
α
x
i
−
β
x
j
1 : α
1 : β
1 : α
1 : β
1 : 1
1 : 1
VDD
Iout
W3
L3
W4
L4
W2
L2
W1
L1
Ibias + ii(t)
(c) (d)
Figure 6.3: Current-mode implementation of (a) addition and (b) subtraction oper-
ations, which are the primary functions for implementing the ADFT using analog
CMOS circuits; (c) NMOS and (d) PMOS current mirrors designed using a low-
voltage cascode topology.
proposed in this chapter can be used as the building block that generates the higher
order beams in a larger array.
6.3 Circuit Topologies, Beamforming Architectures
Realization of the analog-DFT is key to implementing the mmW receiver and trans-
mitter architectures shown in Fig. 6.2. Early attempts to implement analog DFT
processors used op-amp circuits to realize the weights of the DFT matrix [145].
This approach is slow and difficult to scale to larger arrays because the twiddle
factors become closer to each other as the FFT size increases, making them harder
to realize accurately. More recently, a 0.13 µm CMOS 8-point Cooley-Tuckey FFT
processor for orthogonal frequency division multiplexing (OFDM) applications was
reported [146,147]. The processor uses a time-interleaving bank of sample-and-holds
and discrete time analog multipliers, and has been tested with 1 GS/s OFDM inputs.
However, dedicated input signals are used to represent the FFT coefficients, which
121
makes scaling difficult. 2-D rectangular LC lattices implemented on CMOS have
also been proposed for computing analog DFTs of spatial input signals [148]. The
method has been verified using numerical simulations, and bandwidths of > 10 GHz
are possible for on-chip implementations. However, such large bandwidths require
small inductor and capacitor values, which are difficult to realize accurately, and
unwanted mutual coupling between the inductors is also an issue. The analog FFT
processor in [149] uses a current-mirror-based architecture to scale the input cur-
rent by the twiddle factor weights. However, the authors had to approximate the
weights to the first decimal place for ease of implementation, resulting in degraded
beam shapes. Finally, the work in [150] describes a 16-point analog domain FFT
using a charge-reuse analog Fourier transform (CRAFT) engine. The circuit uses
charge reuse to achieve an input bandwidth of 5 GHz. However, the design requires
RF samplers in the front-end, and inaccuracies in the capacitance network lead to
twiddle factor errors that make scaling difficult.
A critical issue faced by all previous approaches for realizing analog DFTs has
been that accurate twiddle factor values are difficult to generate on-chip. The level
of difficulty grows as the FFT size increases since the factors become closer to each
other, which results in performance degradation. Hence it makes sense to allow some
error margin and implement approximate transforms with integer twiddle factors.
The resulting implementations are more scalable since the transform coefficients are
now constrained to a small set of Gaussian integers. The approximate transforms
in (3.6) and (3.9) satisfy this special property, i.e., are limited to small integer
coefficients P ∈ {0,±1,±2}. This property enables high-bandwidth current-mode
analog ICs in which well-controlled geometric parameters (namely, the W/L ratios
of current mirror transistors) determine the integer coefficients.
122
6.3.1 Analog Current-Mode ADFT Designs
Let the current-mode signals captured by the N Nyquist-spaced antennas of an
N -beam multi-beamforming system be xin = [x1, x2, . . . , xN ]
T . The beam outputs
y = F̂N · xin where y = [y1, y2, . . . , yN ]T correspond to unique directions of arrival
given by ψi = sin
−1
(
2k
N
)
where k =




i; 1 ≤ i ≤ N/2
−i; N/2 < i ≤ N
and i is the output
bin number. In current mode, the output current at each output bin yi requires
implementation of
∑N−1
k=0 pikxk where pik denotes a matrix coefficient. The addition
and subtraction arithmetic required for this calculation can be implemented directly
using NMOS and PMOS current mirrors as shown in Fig. 6.3(a). Here α and β are
the weights by which the input current needs to be scaled. Thus the transforms in
(3.9) can be realized using analog current-mode CMOS. The Re{F̂16} and Im{F̂16}
transforms can be implemented separately to realize the full transformation.
In [151], an 8-point current-mode design that follows this approach has been
discussed. Such an approach would in general require O (N2) current mirrors for
generating N -beams. A digital architecture which implements a fast algorithm
(e.g. Fourier, discrete sine/cosine) involves the implementation of butterfly matrices.
Use of such sparse factorization matrices reduces the hardware complexity of the
transform. Given that a sparse factorization exists for the transform of interest,
the same principle can be applied to analog implementations. Therefore, the work
in this chapter targets on benefiting from the sparse factorization that reduces the
arithmetic complexity of the approximate transforms given in (3.14). It is realized
by implementing each factorization stage individually as a series structure.
After observing the factorization matrices of (3.7) and (3.14), it can be seen
that each row and column of the factorized matrix consists of a maximum of two
elements. This implies that i) each output of the factorization stage requires an
123
B5 B4 B2 B
′
1
B1B5
Re{x1}
Im{x1}
Re{x16}
Im{x16}
B3
B4 B3 B2
Re{y16}
Im{y16}I16
I9R9
R16
I9 R9
NMOS
NMOS
PMOS
PMOS
NMOS 1Pos
PMOS
PMOS
9Neg
NMOS
2Neg
2Neg
v1
v9
v16
v2
u1
u2
Im{y1} u9
u16
Re{y1}
(a) (b)
9Neg
1Pos
16Pos
2Pos
R16 I16
Figure 6.4: (a) System architecture of the 16-point analog ADFT; (b) realization of
the B5 factorization stage using current mirros.
addition of the form pai + qaj where p, q ∈ P and ai and aj are ith, jth 0 < i, j ≤ N
inputs to the factorized matrix; and ii) each input ai has to be copied a maximum of
two times. Thus an implementation of each stage of the factorization given in (3.7)
and (3.14) would comprise of N NMOS current copiers producing two current copies
and N PMOS mirrors attached to the outputs of each stage. The full realization of
the analog circuit would require implementation of all the factorization stages and
separate copies of the hardware for the real and imaginary parts as shown in Fig. 6.4.
The analog implementation of the last factorization stage B1 in the real signal path
would require small changes in the circuit compared to the implementation (denoted
by B1) used in the imaginary signal path. Therefore it is denoted as B
′
1. In the
design of the B′1 block, the polarity of the signals entering from inputs 10-16 have
been negated to account for the -1 generated by multiplication of the outputs 10-16
of the imaginary component at the B2 stage by j.
6.3.2 16-point ADFT Implementation in 65 nm CMOS
The reduced hardware complexity of the factorized version makes it attractive for
higher values of N . Specifically, the number of current copies needed for realizing
124
Table 6.1: W/L values of the NMOS transistors used to realize the NMOS current
mirrors at each factorized stage in the 16-point ADFT circuit.
NMOS B5 B4 B3 B2 B1
W1 0.8 µm 0.8 µm 1.8 µm 1.2 µm 1.2 µm
L1 0.15 µm 0.15 µm 0.15 µm 0.06 µm 0.06 µm
W2 0.6 µm 0.6 µm 1.4 µm 1.2 µm 1.2 µm
L2 0.09 µm 0.09 µm 0.06 µm 0.06 µm 0.06 µm
Table 6.2: W/L values of the PMOS transistors used to realize the PMOS current
mirrors at each factorized stage in the 16-point ADFT circuit.
PMOS B5 B4 B3 B2 B1
W1 3.7 µm 3.7 µm 3.7 µm 3.7 µm 3.7 µm
L1 0.24 µm 0.24 µm 0.12 µm 0.12 µm 0.12 µm
W2 4 µm 4 µm 4 µm 4 µm 4 µm
L2 0.24 µm 0.24 µm 0.12 µm 0.12 µm 0.12 µm
an N -point ADFT transform in non-factorized form has a maximum value of 4N2
(asymptotically O(N2)), whereas it is 4SNN in the factorized form where SN is
the number of factorization stages (realized using NMOS mirrors). Since SN ≪ N ,
the latter number asymptotically converges to O(N). The factorized circuit also
needs O(N) PMOS mirrors for performing current addition/subtraction, so the to-
tal number of mirrors required remains O(N) as compared to O (N2) for the direct
approach. For example, 308 current copies are needed for implementing the fac-
torized version of the 16-point ADFT in current mode, whereas a direct realization
would require 768 copies. Although the number of PMOS mirrors required for cur-
rent addition/subtraction of currents is higher for the factorized implementation
(160 versus 96), the overall complexity associated with the factorized implementa-
tion is still significantly lower. Moreover, this performance benefit becomes larger
as N increases. Thus the individual factorization stages given by B1 to B5 in (3.14)
were implemented using 65 nm CMOS. The real and imaginary components of the
designs were implemented separately and the top-level architecture is shown in Fig.
125
6.4. The inputs to the design were assumed to be 5 µA peak-to-peak RF signals
superimposed on a 100 µA DC bias current. The factorized stages were designed
and cascaded starting from B5 to B1 to generate the 16-point transformation as
shown in Fig. 6.4. The diagonal matrix D is implemented as a cross-connection of
wires. All NMOS and PMOS mirrors used are low-voltage cascode current mirrors.
The output bias currents of the stages were equalized by using PMOS mirrors with
appropriate bias currents. The transistor sizes (named as shown in Fig. 6.3 (c) and
(d)) used for the basic NMOS and PMOS mirrors in each individual stage are shown
in Tables 6.1 and 6.2. The transistor sizes of the output branch were set depending
on the magnitude of the matrix coefficient that was realized.
6.3.3 Simulated Beams
The circuit was simulated in Cadence Spectre using noiseless input signals generated
by MATLAB. For a given direction of arrival (DOA) ψ, 16 spatially Nyquist-sampled
sinusoidal signals (which emulate downconverted plane waves) were generated. The
simulation frequencies of the signals were chosen to be within the baseband band-
width of interest, which is smaller than the bandwidth of the circuit being simulated
[12]. These inputs were fed to the circuit and the simulated output waveforms were
recorded. To obtain the array factor for each bin of the ADFT, waveforms were
generated for different values of ψ in the range −90◦ to 90◦. The output wave-
forms were then exported to MATLAB in order to compute the beamformed signals
as a function of ψ. Fig. 6.5 shows the power patterns of each beam at different
frequencies.
In general, the beam shapes obtained from Cadence simulations closely follow
theoretical ADFT responses. The side lobe levels of the beam patterns at 500 MHz
126
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-0
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-1
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-2
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-3
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-5
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-6
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-7
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-4
Figure 6.5: Beam outputs generated from Cadence Spectre simulations for output
bins 0-7 of the 16-point analog ADFT design. Each sub-figure shows beam patterns
for different IF bandwidths from Cadence and the simulated from MATLAB. .
remain consistent with theoretical responses except for bins 5, 7, 11, and 13, which
have significantly higher side-lobe levels in the stop band than expected. This
effect arises due to slight deviations between the theoretical coefficients and those
127
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-10
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-11
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-15
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-14
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-12
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-8
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-9
-50 0 50
Angle [°]
-40
-30
-20
-10
0
G
ai
n 
[d
B
]
1.5 GHz
1 GHz
0.5 GHz
Matlab
Bin-13
Figure 6.6: Beam outputs generated from Cadence Spectre simulations for output
bins 8-15 of the 16-point analog ADFT design.
realized by the cascaded current-mode architecture. Since the bias currents grow
in magnitude when traversing from input to the output through the factorization
stages, the absolute values of the current matching errors also tend to grow. Thus,
128
(a) (b)
(c) (d)
Figure 6.7: (a)The proposed V-I converter circuit. (b) Simulated input reflection
coefficient |S11|. (c) Simulated THDversus input amplitude. (f) Simulated input-
referred noise PSD over the frequency range from 1 MHz to 10 GHz.
the realized coefficient values deviate from the required ones, resulting in deviations
in the beam patterns. Nevertheless, these changes are in the stop band and are
lower than the maximum side lobe level. Thus, they are of minor importance for
beamforming applications. However, they become more significant as the frequency
increases. At a baseband frequency of 500 MHz the average and worst case peak
lobe levels were -12.9 dB and -11.85 dB, respectively. At 1 GHz and 1.5 GHz these
numbers increased to -12.2dB and -11.32dB (average) and -10.17 dB and -9.08 dB
(worst case), respectively.
129
6.3.4 Design of V-I/I-V Converter Circuits
As shown in Fig. 6.2, V-I/I-V converter circuits are necessary for interfacing the
current-mode ADFT core with external circuits. Thus, such converters have to be
added with 50 Ω impedance at each input and output port of the ADFT realization
circuit. The noise and linearity of the V-I converter dominate the dynamic range
(DR) of the multi-beamformer. The VI converter design shown in Fig. 6.7(a) uses
a common-gate input stage for impedance matching. The bias current is adjusted
via Vn to set the desired input impedance Zin ≈ 1/gs, while the AC current is
mirrored to create the output current Iout. The DC value of Iout, which sets the
power consumption and bandwidth of the current-mode core, can be independently
adjusted via Vp.
Fig. 6.7 shows simulation results for the V-I converter for a NMOS bias current
(set by Vn) of 1.6 mA, a PMOS bias current (set by Vp) of 1.1 mA, and an output
bias current of (1.6− 1.1) = 0.5 mA. The power consumption is 2.38 mW. A Bode
plot of the effective small-signal transconductance Gm (see Fig. 6.7(c)) shows a -3 dB
bandwidth of 2.7 GHz. The lower cut-in frequency is set to fc ≈ gs/ (2πCdc) by the
value Cdc of an input DC blocking capacitor (not shown here). The circuit is well-
matched over the useful frequency range: |S11| is approximately -16 dB from 10 MHz
to 4 GHz as shown in Fig. 6.7(b). The input-referred noise power spectral density
(PSD) - including noise from the source resistance - is ≈ 1 nV/Hz1/2 as shown in
Fig. 6.7(f). The resulting noise figure (NF) is 6-8 dB over the operating bandwidth,
which is adequate since in practice this baseband circuit will be preceded by an
RF receiver chain. The total integrated (1 MHz∼10 GHz) output current noise is
iout,n = 1.0 µArms.
The simulated total harmonic distortion (THD) versus input amplitude at two
frequencies is shown in Fig. 6.7(e). As expected, THD levels increases with fre-
130
quency. The maximum input amplitudes for THD < 5% are Vin,max = 30 mV and
13 mV at 100 MHz and 1 GHz, respectively. Assuming no further low-pass filtering
to limit the output bandwidth, the dynamic range (DR) of the circuit is
DR = 20 log10
(
Vin,maxGm/
√
2
iout,n
)
. (6.1)
The resulting values are 49.9 dB and 42.2 dB at 100 MHz and 1 GHz, resulting in
an effective number of bits (ENOB) of 8.0 and 6.7 bits, respectively. Similarly, the
signal to noise and distortion ratio (SNDR) of the ADFT circuit is
SNDR = 20 log10


Vin/
√
2
√
v2in,n + α
2V 2in/2

 , (6.2)
where vin,n = iout,n/Gm is the total input-referred noise voltage, and α is the THD
at an input amplitude of Vin. For a 1 GHz input, the maximum value of SNDR ≈
32 dB occurs for an input amplitude of Vin = 7 mV, for which THD = 2.5%.
This results in a significantly lower ENOB of 5.0 bits. Note that the capacity of
the wireless system depends on both THD and SNDR; the relative importance of
these specifications thus needs further study. Given that the noise and linearity of
the current-mode core and output I-V converter do not limit system performance
(the latter is simply a 50 Ω resistor), the ENOB of the final analog outputs are
limited either by THD or SNDR to the values stated above. For the rest of the
chapter it is assumed that the overall ENOB is limited by SNDR, i.e., to 5.0 bits.
This level is considered very conservative precision that is sufficient for most mmW
communications applications.
The finite bandwidth of the current mirrors in the ADFT core slightly reduces
the output bandwidth compared to that shown in Fig. 6.7(c). The -3 dB bandwidth
of a single mirror is given by BW ≈ gm/Ctot. where gm is the transconductance
of the input transistor and Ctot ≈ 2Cgs is the total parasitic capacitance. BW can
131
be improved at the cost of current matching accuracy (and eventually beam shape
fidelity) by decreasing the transistor area WL, since Ctot ∝ WL and threshold-
voltage mismatch σ∆Vth ∝ 1/
√
WL. Alternatively, it can also be improved at the
cost of power consumption by increasing gm ∝
√
Ids. In this case the value of gm
(and thus BW ) can be adjusted through the bias voltage Vp in the V-I converters.
The actual circuit uses N = 3 cascaded mirrors (one NMOS, two PMOS) in the
signal path. Assuming that these are identical, the bandwidth is further reduced to
BW ×
√
21/N − 1 ≈ BW/2.
6.4 Comparison with a Baseline Digital Implementation
Equivalent digital designs for the proposed analog 16-point ADFT circuit that was
implemented in hardware description language (HDL) for the work in Chapter 3 was
used to compare the bandwidth, power, and area requirements with the proposed
analog implementations. The digital core was synthesized using the NCSU 45 nm
FreePDK library [129]. Note that this is a more advanced technology than the 65 nm
process used for the analog designs. The input word length for the digital synthesis
was set to 6-bits to be comparable with the SNDR-limited analog ENOB of 5.0 bits.
The figures of merit for the both digital and analog implementations are listed
in Table 6.3 for comparison. The number of digital cores Nc needed for processing
a bandwidth B (assuming polyphase sampling) is given by Nc =
(
2B
fs,max
)
. Thus to
handle a bandwidth of 1 GHz Nc ≈ 2. The total power consumption of the digital
equivalent implementation includes that of the digital cores as well as the analog-
to-digital converters (ADCs). For quadrature receivers, the digital implementation
requires two ADCs per antenna (in total 2 × N for an N -point transform) while
the analog implementation only requires two ADCs per sampled beam (real and
132
imaginary outputs). Therefore, when implementing an N -point transform in analog,
the number of ADCs required is 2M where 1 ≤M ≤ N . The numbers in Table 6.3
assume the worst case, i.e., that allN beams are sampled, resulting inM = N . ADC
power was estimated by assuming a converter with suitable specifications (≥ 1 Gs/s,
ENOB = 5-6 bits) and the lowest possible Walden figure of merit (FoM). We searched
Dr. Boris Murmann’s ADC survey [152] for this purpose. As of writing, the lowest
reported FoM is 28.7 fJ/conversion for ENOB = 5.5 bits at 1 Gs/s [153]. As shown
in Table 6.3, since the bandwidth of the digital cores is ≈ 460 MHz, the ADC power
is 2× 0.46 GHz ×25.5 × 28.7 fJ = 1.2 mW per channel.
According to the table, the total power consumption of the proposed analog
implementation is ≈ 41% less than that of the digital implementations. This is
despite the fact that the digital results were obtained using a more advanced process
(45 nm versus 65 nm).
There is another significant advantage of the analog implementation. In a fully-
digital implementation all antenna outputs have to be amplified to span the full-scale
input range of the ADCs. For example, the design in [153] has a input single-
ended range of 300 mV. Amplifying all N input signals to such large levels prior
to digitization is not trivial; the amplifiers have to be linear and so are power
hungry. This is even more difficult if directional blockers are present, since these
blockers are not rejected until after the beamformer and so the amplifiers and the
ADCs have to deal with them. On the other hand, the proposed analog multi-
beamformer can sit right after a baseband mixer, so only M amplifiers (one per
beam) are needed. Moreover, since these amplifiers operate after the beamformer,
their linearity requirements are relaxed since the beamformer can greatly suppress
blockers.
133
Table 6.3: Comparison of 16-point analog ADFTs with digital implementations in
the 45 nm FreePDK library.
Digital Circuit Analog Circuit
Tcpd 1067 ps -
fs,max 937 MHz -
BW 468 MHz 1 GHz
Power 215 mW (1) 162.4 mW
Area 66556 µm2 (2) (layout not done)
ADC power 38.4 mW 38.4 mW
V to I power - 76.1 mW
Total power
(1 GHz BW)
468.4 mW (3) 276.9 mW
Total area
(1 GHz BW)
133112 µm2 (4) -
6.5 Conclusion and Future Work
An analog CMOS architecture that generate 16 simultaneous beams using the ap-
proximate DFT transforms introduced in Chapter 3 has discussed. The direct re-
alization of the ADFT matrices for beamforming by mapping the proposed ADFT
matrix to analog current mirrors has a hardware complexity (number of mirrors) of
O(N2); making it difficult to realize higher values of N for 5G systems with massive
MIMO front-ends. Thus, a more scalable approach has been proposed for the real-
ization of the 16-point ADFT in analog CMOS for obtaning 16 simultaneous beams.
Instead of directly implementing the matrix, individual sparse factorization stages
of the 16-point matrix is proposed to map to current mirrors. This approach reduces
the number of current mirrors to O(N), resulting in lower hardware complexity and
circuit area. However, realizing even larger values of N remains challenging.
Beamforming circuits were designed in 65-nm GP CMOS technology. The de-
signs were simulated using Cadence Spectre to obtain the multi-beam array factors.
Moreover, Cadence simulation results show high beam fidelity up to 1.5 GHz of base-
band bandwidth, which is sufficient for proposed 5G communications standards.
134
As future work, the 16-point ADFT circuit can be laid out and fabricated to
obtain measurement results. Efforts can be targeted to improve the circuit topology
of the proposed V to I converter which is not particularly power efficient as it
significantly increases total power consumption. Future circuit design efforts can
thus be focused on designing a more power-efficient V-I topology.
135
CHAPTER 7
ANALOG LOW-COMPLEXITY SQUINT-FREE WIDEBAND
MULTIBEAM NETWORKS
The ability to form a large number of sharp and parallel RF beams that can
support high bandwidth that are not dependent on the frequency of operation is a
major challenge for high-capacity wireless systems. This is a common problem to
both sub 6 GHz legacy systems as well as emerging mmW systems. Emerging 5G
systems promise to achieve orders of magnitude increase in the capacity and data
rate owing to the proliferation of bandwidth that is being opened up but the key for
enabling such throughputs rely on the ability to form multiple simultaneous beams
across the full bandwidth of the link [154]. The wideband beamformers should
be squint-free (this phenomenon will be explained in Section 7.1) and capable of
handing the full RF bandwidth to leverage the true benefit of 5G and above 5G
systems. Nevertheless, most of the initial mmW systems that uses larger antenna
arrays are proposed to use analog phase shifter based beamformers due to cost of
implementations. Thus, producing higher number of analog beams simultaneously
that does not suffer from beam-squint is a challenging problem.
However, the realization of multibeams itself is a difficult problem as was dis-
cussed in the previous chapters, due to the associated high complexity of the aperture
transceivers. The wideband beamformers require true-time-delays at each antenna
path, and wideband multibeam networks would require TTD networks to achieve the
required bandwidth. For example, for an N -element receiver array, the beamform-
ing network (BFN) required for forming N beams has N beamforming networks,
which in turn requires N2 time delays or phasing elements. Thus, while conceptually
simple, multi-beam array receivers and transmitters are difficult to realize in prac-
tice due to the underlying complexity of the N -beam SFG. This chapter proposes
136
(b)
Ω3
Ω2
Ω1
Ωct
ωx
(a)
(c)
Ω2
Ω1
Ω3
Ωct
ωx
(d)
Figure 7.1: (a) 2D frequency response of a typical complex-weighted phased array
beam; (b) its array factor at different temporal frequencies showing squint; (c) beam
realized using true-time-delays, and (d) its squint-free array factors at different
frequencies.
a method of reducing the implementation complexity of the analog non-squinting
wideband N -beam networks.
7.1 The Problem of Beam Squint
In general, the beamformers that are implemented by realizing complex weightings
at each antenna elements are narrowband. For the case of multibeam beamformers
that are DFT based in digital or “Butler Matrix” type multi-beam array beamform-
ers that are well known in literature on analog implementations are also narrowband.
This is due to the fact that, the DFT or the Butler Matrix implements its “twiddle
factors” (complex weights) which are inherently narrowband. The fact that beam-
forming weights are narrowband makes the beam direction strongly depend on the
137
V/I V/I
N−beams
?multi−beamformerLow complexity DVM based  
QI
Multi beam outputs
R
F
R
F
R
F
fr
o
n
t
-e
n
d
fr
o
n
t
-e
n
d
fr
o
n
t
-e
n
d
Figure 7.2: An algorithm that enables low-complexity realization of analog TTD N
beam networks is desired for future wideband systems that demands higher number
of multiple simultaneous beams.
temporal frequency. This phenomenon is known as the beam-squint. Fig. 7.1
illustrates this problem. Fig. 7.1(a) shows the 2D frequency response of a typical
complex-weighted phased array beam. It is obvious that the passband is confined
to a single spatial frequency and thereby is narrowband and works only at the vicin-
ity of Ω3 frquency producing the intended beam direction. Fig. 7.1(b) shows the
different array factors at different temporal frequencies. It can be clearly seen that
the beam direction changes with respect to temporal frequency and thus undergoes
squinting. On the other hand, Fig. 7.1(c) shows the 2D frequency response of beam
realized using true-time-delays. Similarly, Fig. 7.1(d) shows the squint-free array
factors at different frequencies. The term Ωct in Fig. 7.1 refers to the wave speed
normalized temporal (continuous-time) frequency and ωx refer to sampled spatial
frequency.
7.2 Analog Multibeam Beamformers
Analog beamforming networks which are based on either Butler matrix circuits
or microwave lenses, are used to implement multi-beamforming systems in ana-
log [57, 60, 61, 155]. BFNs consist of an antenna array fed by a multiple-input
138
multiple-output microwave network. The Rotman and Ruze lenses are TTD systems
which provide multiple frequency independent beams (with no beam squint) [60].
A detailed article of the design of mmW Rotman lenses can be found in [156].
The Butler, Blass, and Nolen matrices are the other category of BFNs [57, 61,
155]. Butler matrix based BFNs realize the twiddle factors of the DFT matrix in
analog. Since, the DFT matrix can be computed in fast-algorithms, Butler matrix
based multibeam systems can be implemented at O(N logN) hardware complexity.
A review of Butler matrices and Nolen matrices based multibeam networks can
be found in [61, 155]. The latter make use of a computer-coded algorithm for the
design of the networks. Butler matrices with unequal numbers of inputs and outputs
is reported in [157]. A mmW Butler matrix multibeam beamformer is proposed
in [158].
Microwave lenses are wideband, provide low-phase error, and are capable of wide-
angle scanning compared to the matrix-based implementations. However, lenses
are large and bulky because they are typically based on planar microwave/mmW
technologies. One the other hand, the TTD beamformers do provide completely
squint-free wideband beams. As illustrated in Fig. 7.2, TTD beamformers do not
have O(N logN) complexity typical of Butler matrix/FFT beamformers or similar
low-complexity realization algorithm. TTD N -beam networks are of complexity
O(N2). Thus, direct implementation of such beamformers are not feasible even
for moderately large N and thus entailing the development of FFT-like wideband
multi-beam beamforming algorithms.
Hitherto, there has been no FFT like fast algorithm proposed in the literature
that reduces the incurred complexity of a wideband N beam TTD system. There-
fore, this chapter investigates novel fast algorithms to obtain wideband squint-free N
beam BFNs at a much lower hardware complexity using analog RF CMOS circuits.
139
7.3 Analog RF Squint-Free N Beam System Model
Consider a ULA of antennas where the elements are placed at the Nyquist spacing
∆x (i.e. ∆x = λmin/2 where λmin corresponds to the highest frequency of interest).
It is assumed that the signals of interest of the system that impinge on the ULA is
bandpass at the center carrier frequency carrier Ωc = 2πfc with a bandwidth B.
From the previous discussion in Chapter 2, the system model of a p beam network
having an N element antenna array involves implementing the linear system given
in (2.24). In this analysis an N beam (i.e. p = N) system is considered utilizing
the full DoFs from the array. Therefore the system model with p = N is considered
and (2.24) for p = N is given below.
ym = WN · Z · apw,m. (7.1)
Here ym is the Fourier domain vector containing N RF beams with ym =
[ym(0, jΩt), ym(1, jΩt), . . . , ym(N − 1, jΩt)]⊤ and WN ∈ CN×N is the N ×N ma-
trix containing N beamforming vectors as given in (2.25).
WN = [w1,w2, . . . ,wN ]
⊤ . (7.2)
Each wk =
[
1, e−j2πfτk , . . . , e−j2πf(N−1)τk
]⊤
, 1 ≤ k ≤ N , where τk = ∆x sinψkc corre-
spond to a steering weighting vector realizing a beam at an angle of ψk off broadside.
Therefore, the WN matrix takes the form as given in (7.3) where αk = e
−jΩτk .
WN =









1 α1 . . . α
(N−1)
1
1 α2 . . . α
(N−1)
2
...
...
. . .
...
1 αN . . . α
(N−1)
N









(7.3)
The system WN in (7.3) is in the generalized Vandermonde matrix structure. A
low-complexity implementation of (7.3) is feasible if a sparse factorization can be
140
found for WN . If at all this is possible, then the TTD multibeam network that
is being realized via a butterfly network has to be based on a single unit-delay
element. Typically, making longer delay realizations in analog IC design is difficult.
Realizing lower group delay filters such as APFs is relatively easy and longer delays
are realized by cascading such smaller group-delay active elements. Therefore, if the
TTD beamforming network in (7.3) is represented in a realizable group delay tGD
then the modified transformation matrix WN of (7.3) can be rewritten in terms of
α as expressed in (7.4) where α = ejΩttGD .
WN =









1 α . . . α(N−1)
1 α2 . . . α2(N−1)
...
...
. . .
...
1 αN . . . αN(N−1)









. (7.4)
WN in (7.4) is a special form of the generalized Vandermonde structure and as it
represent a physical delay matrix it will be subsequently referred to as the “DVM”.
For a given group delay element tGD and if the inter-element spacing of the
antenna array is ∆x, then the look-directions of the beams that would be realized
by a delay network represented by the DVM in (7.4) are given by (7.5).
ψk = sin
−1 c · k · tGD
∆x
, 1 ≤ k ≤ N. (7.5)
If the beam directions of the N -beam multibeam beamformer are predetermined to
be ψk = sin
−1( k
N
), then the unit TTD (τ) that needs to generate such beamforming
network would be τ = ∆x
cN
. If the antenna elements of the beamformer is critically
spaced and the above required delay can be expressed in terms of the maximum
frequency of interest in the system fmax as τ =
1
2fmaxN
.
The number of TTD elements of τ delay (or the smallest unit delay) that is
required to construct the beamforming network is given by
∑N
i=1
∑N−1
k=1 k · i and is
141
equal to N
2(N2−1)
4
. This chapter investigates a method of using a fast algorithm
based factorization to reduce the total amount of N
2(N−1)2
4
TTD elements needed in
the squint-free wideband beamforming network.
7.4 Factorized DVM Algorithms
In general for a matrix WN ∈ CN×N determined by N2 entries, the computational
complexity of computing WNx for the input vector x ∈ CN costs O(N2) operations,
with N2 complex multiplications and complex N(N − 1) additions. For well struc-
tured types of WN , e.g., like banded, Toeplitz, Hankel, Bezoutian, Cauchy, Vander-
monde, Quasiseparable, DFT, etc., can be exploited to reduce the computational
cost of the matrix vector product. Since the DVM is a Vandermonde structured
matrix with complex entries the structure of the DVM can be used come up with
a sparse factorization and also to derive a “novel” fast and stable DVM algorithm
while reducing the arithmetic complexity leading to lower hardware complexity in
circuit realizations.
Studying the literature, from [159, 160], if a Vandermonde matrix V is of the
form V = [xki ]
N
i,k=0 where x0, x1, . . . , xN ∈ R (real entries), then it can be factored
into the product of 1-banded upper and lower triangular matrices with division
entries by using complete symmetric functions. A O(N log N) complexity algo-
rithm has been reported in [161] to compute Vandermonde matrices having distinct
prime nodes via interpolation. The work in [162] has shown that the product of
a complex Vandermonde matrix and a vector can be computed using O(N log N)
complex arithmetic operations by exploiting its features such as using its low rank
displacement structure, and pre-computing the generators. The existing work and
results have all been established for the real nodes such that xi ∈ R of the matrix
142
V except [159]. But the work in [159] even has limitations for a generator of the
complex Vandermonde matrix. Therefore a fast algorithm that is based on complex
nodes without considering quasiseparability and displacement equations is needed
for the problem at hand. For this regard, our collaborator, Dr. Sirani Perera at
Embry-Riddle Aeronautical University has proved that the LU factorization based
sparse factorization for the complex nodes for reducing the O(N2) implementation
complexity of the N -beam TTD beamforming network. The detailed description of
the algorithm, proofs and error analysis can be found in [163].
The sparse of the DVM WN over complex nodes {α, α2, . . . , αN} in (7.4) for
N ∈ Z+ ≥ 4 can be factored as given in (7.6) [163],
WN = L
(1)L(2) · · ·L(N−1)U(N−1) · · ·U(2)U(1), (7.6)
where for 1 ≤ m ≤ N − 1
L(m) =
















IN−m−1
1
1 αN−m(α− 1)
1
. . .
. . .
. . .
1 αN−m(αm − 1)
















,
143
and
U(m) =
















IN−m−1
1 α
1 α2
. . .
. . .
. . . αm
1
















.
The factorization of the DVM allows to compute the product of the DVM with
a vector with only the use of the second column of the matrix WN . That is, in
actual circuit implementations, the number of delay elements that needs to realize
WN will be reduced than the direct implementation of WN . As noted in [163],
since the Vandermonde matrix structure is considered extremely ill-conditioned,
the condition number of the matrix tends to grow exponentially with the matrix
size [164–166]. Therefore, an analysis of the error bound and the stability of the
DVM factorization algorithm is important and a detail analysis of that can be
found in [163]. Mathematical proof for the factorization involving complex nodes
was performed by a collaborator at Embry-Riddle Aeronautical University. This
particular approach has been used to develop example designs of 4- and 8-beam
squint-free beamformers. The design approach and the challenges are discussed in
subsequent sections.
7.5 4-Element Array Example using the Factorization
As an initial proof of concept, the 4-point DVM W4 based 4-beam array is consid-
ered. The circuit designs for the 4-beam networks arising from W4 are analyzed
and the proposed beamforming networks are simulated using actual measured cir-
cuit responses that will be used as the basic realization element of the network. For
144
the 4-point case the DVM is given in (7.7). Here, the α represents the TTD delay
e−jωtτ .
W4 =









1 α α2 α3
1 α2 α4 α6
1 α3 α6 α9
1 α4 α8 α12









, (7.7)
It is noted that direct implementation of (7.7) involves 60 TTD elements. According
to the analysis in 7.4, (7.7) can be factorized into 6 sparse factorized stages as given
in (7.8).
W4 = L1L2L3U3U2U1, (7.8)
Each Ui i ∈ [1, 6] is given in (7.9).
U1 =









1 0 0 0
0 1 0 0
0 0 1 α
0 0 0 1









, (7.9a)
U2 =









1 0 0 0
0 1 α 0
0 0 1 α2
0 0 0 1









, (7.9b)
U3 =









1 α 0 0
0 1 α2 0
0 0 1 α3
0 0 0 1









, (7.9c)
145
L3 =









1 0 0 0
1 α(α− 1) 0 0
0 1 α(α2 − 1) 0
0 0 1 α(α3 − 1)









, (7.9d)
L2 =









1 0 0 0
0 1 0 0
0 1 α2(α− 1) 0
0 0 1 α2(α2 − 1)









, (7.9e)
L1 =









1 0 0 0
0 1 0 0
0 0 1 0
0 0 1 α3(α− 1)









. (7.9f)
The factorized sparse matrices with elements of the matrices being mostly 1’s and 0’s
leads to low-SWaP realizations. The low-complexity factorization, which requires
only 24 delay elements, simplifies the N -beam architecture compared to the direct
implementation of A4 which involves 60 TTD elements. However, the realization
of a large number of delay elements that operate over a very wide band is still
challenging.
Realization of the TTD in Analog-IC
The availability of a low-complexity factorization that implements delay elements
simplifies the N -beam architecture. However, the realization of a large number of
delay elements that operate over a very wide band is still challenging. A passive
approach, based on transmission lines, is impractical for microwave frequencies,
because the wavelength can be several mm or even a few cm (for S and L bands).
146
In lieu of transmission line solutions, analog RC-active circuit realizations of first-
and second-order all-pass filters (APFs) using CMOS technology for approximating
the required delays without using transmission lines. The ideal linear phase delay
e−jΩtτ can be approximated by APFs that can be realized on-chip at low-SWaP
using CMOS technology [167, 168]. Thus an analog RC-active circuit realizations
of a current-mode second-order APF is proposed for approximating the required
delays [168]. According to [167], an ideal unit delay of τ can be approximated
as shown in (7.10). Therefore, the TTD can be realized in an analog RC-active
topology using a cascade of M second-order all-pass filters.
e−jΩtτ ≈
(
1− jΩtτ/2M
1 + jΩtτ/2M
)M
, M ∈ Z+, (7.10)
Typically, M = 3 is sufficient for approximation of τ [167]. Let ψ(s) =
(
1−jΩtτ/2M
1+jΩtτ/2M
)M
denote the transfer function of an APF approximation of the delay
τ . Then, the beamforming matrix from sparse factorization of W4 can be approx-
imated by APF matrix Ψ ≈ W4, where the (l, k)-th elements of Ψ takes the form
ψlk(jΩt). Since, W4 = L1L2L3U3U2U1, each Ui and Li can now be realized using
ψ(jΩt) as the unit delay to realize each coefficient. Fig. 7.3 shows the SFG resulting
from the factorization which uses all-pass filter blocks having the transfer function
ψ (s) to approximate the TTD denoted by α. The 4 beams from the DVM W4
correspond to the beams indicated 1-4 in Fig. 7.3. A 0th beam is also shown in the
figure which is the direct sum beam and is independent of the factorization. All the
ψi(s) where i ∈ {2, 3, 4} can be realized by cascading i ·M identical APFs. Note
that the realization of the 4th row of the factorization has been further optimized
to reduced the number of delay blocks used.
147
x[0]
x[1]
x[2]
B
ea
m
 0
B
ea
m
 1
B
ea
m
 2
B
ea
m
 3
B
ea
m
 4x[3]
Y[4]
Y[0]
Y[1]
Y[2]
Y[3]
U2
ψ
2 (
s)
ψ
(s
)
ψ
(s
)
L1
ψ(s)ψ(s)
L3
ψ(s)
φ4(s)
φ2(s) φ3(s)
φ1(s)
ψ(s)
U1 U3 L2
φ5(s)
φ2(s) = ψ(s)(ψ
2(s)− 1)
ψ(s)
φ5(s) = ψ
3(s)(ψ(s)− 1)
φ4(s) = ψ
2(s)(ψ2(s)− 1)
φ3(s) = ψ
2(s)(ψ(s)− 1)
φ1(s) = ψ(s)(ψ(s)− 1)
Direct sum beam
Factorized implementation of W4
Figure 7.3: SFG of the proposed 4-point DVM factorization algorithms.
0.3
f [GHz] f [GHz]
|S
2
1
|
[d
B
]
∠
S
2
1
[r
a
d
]
(a) (c) (d)(b)
Figure 7.4: (a) CMOS all-pass circuit and the (b) die micrograph of which the
measured responses are taken. (c-d) measured magnitude and phase of the APF
chip (this chip was designed and developed by Dr. L. Belostotski at University of
Calgary, Canada)
7.6 Simulated Beams using Measured APF Data
As a proof of concept, measured responses from an analog CMOS all-pass filter that
has been designed and built by one of our collaborators are used to evaluate the
performance of the beams resulting from the factorization.
148
7.6.1 Second-Order Current-Mode CMOS All-pass Filter
The all-pass filter that was used for the simulations was a second-order current-
mode CMOS all-pass filter designed using a single-transistor that is a complementary
circuit to Type-1 (filter #1) circuit [168]. The circuit of the APF is shown in Fig. 7.4.
The APF design employs a capacitor C for the original element Z1, a resistor R2
and the transistor output conductance gds for Z2, and an inductor L for Zs. The
transistor M is biased with a PMOS current source Ibias. Without accounting for
parasitic effects due to gate-drain capacitance, which tend to increase the order
of the filter to three, this circuit is a 2nd order circuit described by the transfer
function:
ψ (s) =
vout
vin
=
s2 + (1−K) ωz
Q
s+ ω2z
s2 + ω0
Q
s+ ω20
, (7.11)
where K = 1+gmR2
1+R1R2C/L
and Q = ω0CL
CR2+L/R1
. To implement the 2nd-order all-pass
filter, the circuit has been designed such that K = 2 and Q < 1/2 to avoid complex
conjugate poles and zeros. The group delay of the resultant filter is:
t0 (ω) =
2ω0CL/Q
1 +R2
(
R−11 + gm
)
︸ ︷︷ ︸
t0
×
(
ω2
ω20
+ 1
)
(
ω2
ω20
− 1
)
+
t20ω
2
4
, (7.12)
where t0 is the low frequency delay. The input resistance of this circuit is Rin ≈
R1 + 2 (CR1/L+ 1/R2). The output resistance is determined by the signal-source
resistance Rs and is Rout ≈ Rs‖ (g−1m ‖R2 +R1). In this work both Rin and Rout have
been designed to be near 50Ω to allow measurements with standard laboratory
equipment The schematic of the APF circuit is shown in Fig. 7.4(a), and results
in gm = 62mA/V, ω0 = ωz ≈ 6GHz, and Q ≈ 0.36. The die micrograph of the
chip that has been used to obtain the measured responses are shown in Fig. 7.4(b).
Corresponding measured magnitude and phase of the all-pass filter chip is shown
in Fig. 7.4(c) and (d), respectively. The power consumption of the filter has been
149
calculated to be 18.5mW from a 1.5 V supply. In Section 7.6.2, the measured results
of the APF chip are used to obtain the wide-band squint free beams based on the
efficient factorization of DVM.
B
in
3
(Y
[3
])
B
in
3
(Y
[4
])
π
−π
ωx
ωct
π -π
(c)
π
ωx
−π
0
1
π −π
1
0
−π
ωx
ωct
Array factor
0 dB
-10 dB
0 dB
-20 dB
0 dB
0 dB
(f)
(g)
(h)
(i)
-π
π
π
B
in
1
(Y
[1
])
(d)
(e)
π
ωct
(j)
-10 dB
-20 dB
-20 dB
-10 dB
-10 dB
0 dB
-20 dB
-40 dB
-20 dB
0◦
14.5◦
30◦
48.6◦
90◦
(b)
1
0
−π
ωx
π
-π
π
(a)
0
1
ωct
π
-π
−π
ωx
2-D magnitude frequency response
D
ir
e
c
t
su
m
(Y
[0
])
B
in
2
(Y
[2
])
Figure 7.5: 2-D frequency response of (a-e) beams corresponding to
Y [0], Y [1], Y [2], Y [3] and Y [4]; (f-h) corresponding array patters for temporal fre-
quency values 2.4, 2.0, and 1.6 GHz.
150
(a)
-40 dB
-20 dB
0 dB
-40 dB
-20 dB
0 dB
(b)
14.5◦14.5◦
Figure 7.6: Comparison of the array factors corresponding to beam no. 1 (a) using
ideal TTD and (b) measured APF data.
7.6.2 Simulated 5-Beams using Measured APF Data
The five beams given by the full SFG shown in Fig. 7.3 were simulated in Matlab
using the measured APF data. The group delay TGD of the APF was calculated as
57ps. The antenna spacing is determined such that ∆x = c TGDN . For N = 4,
this spacing is 68mm. Fig. 7.5 shows the magnitude 2D frequency response and
the corresponding array factors of the simulated beams. Fig. 7.5 (a) and (f) show
the magnitude 2D frequency response of the direct sum beam and the corresponding
array factors, respectively. The array factors have been simulated for the frequencies
2.4, 2.0, and 1.6 GHz. Fig. 7.5 (b-e) respectively show the 2D magnitude frequency
response for the beams corresponding to factorized A4, obtained using the measured
APF data. In Fig. 7.5 the simulated squint-free array factors obtained for each of
the beams are depicted. Fig. 7.6 (a-b) compares the simulated array factor from of
the 1st beam for both measured and ideal APF responses.
151
7.7 Using Factorized DVM for Achieving Wideband
Multibeams at IF
Performing the multibeam beamforming in analog-IF stage rather than the RF
can be advantageous in some scenarios. Specially, for cases like mmW systems,
performing the beamforming in the RF can be much costlier than doing it in IF or
baseband. These mmW systems may realize multi-beams using IF analog micro-
electronics that support the required bandwidths.
As described in Chapter 6, the use of DFT/FFT along the spatially sampled
signals in IF/baseband (in an N element ULA) would sample the spatial frequency
axis to N number of points yielding N beam outputs pointing to unique look di-
rections (corresponding to the respective spatial frequency) at each output bin of
the DFT [169]. The use of the FFT provides the lowest hardware complexity for
realizing a simultaneous N independent beam architecture [169] in such systems.
Such systems has a computational circuit complexity of O(N logN). In Chapter
6, methods of achieving simultaneous multibeams in analog at much lower circuit
complexity (lower than that of using FFT) with the use of DFT approximations
were investigated.
Albeit of these methods being less hardware complex, as described earlier in
this chapter, the FFTs do not provide squint-free wideband beams. This is in fact
true for any wideband beamforming approach either at RF or IF. This section will
analyze the algorithmic and circuit level ideas that is needed to obtain squint free N
simultaneous beams in analog IF. Thus, a low complexity, wideband IF beamforming
architecture having the overview architecture shown Fig. 7.7 that extends the theory
proposed in the previous sections is investigated. Lower complexity is achieved by
introducing a sparse factorization to the TTD multi-beam matrix, which leads to a
152
90
0
Ant-1 Ant-2 Ant-N
Beam-1 Beam-NBeam-2
I Q
analog N -beams at IF
Low-complexity wideband
Figure 7.7: Overview architecture of a wideband N -beam array at IF.
lower number of required TTD elements and phase compensators. TTD elements
are proposed to realize using analog CMOS APF circuits, which can be efficiently
implemented.
7.7.1 Proposed TTD IF Multi-Beamformer Model
Let’s consider a uniform linear array of antennas where the elements are placed at
the Nyquist spacing ∆x (i.e. ∆x = λmin/2 where λmin corresponds to the highest
frequency of interest). Assume that the signal of interest that impinges on the
array is assumed to have a bandwidth of B centered around the modulated carrier
fc. Therefore, λmin =
c
fc+B/2
where c is the speed of propagation of the waves.
Following (2.19), the frequency response of a beamformer that forms a beam at a
direction φ is given by,
H(ejωx,Ωt, ψ) =
N−1∑
k=0
e−jk(ωx+
∆x
c
Ωt sinψ). (7.13)
Here, ωx stands for the normalized spatial frequency (−π ≤ ωx < π) and |f − fc| ≤
B/2 where f = Ωt
2π
is the temporal frequency in Hz. A system having an IF stage
with synchronous mixers ΩIF = Ωt − ΩLO, where ΩIF denotes the down-converted
frequencies of the signal and ΩLO is the local oscillator frequency is assumed. For
direct down-conversion receivers, ΩLO is selected as Ωc. Thus, in this model, ΩIF =
153
Ωt − Ωc. The equivalent IF response is given by,
H(ejωx,ΩIF , ψ) =
N−1∑
k=0
αke−jkωx, (7.14)
where α = e−j
∆x
c
(ΩIF+Ωc) sinψ.
The model in (7.14) is equivalent to (7.13) unless the fact that the weights at
each kth antenna element in the frequency domain needs to realize a TTD delay and
phase rotation of the signal for generating the beam. The multibeam network for
generating N beams in IF also follow (2.24) where the multibeam matrix is equiva-
lent to (2.25) with each wk =
[
1, e−jΩIF τke−jΩcτk , . . . , e−jΩIF (N−1)τke−jΩc(N−1)τk
]⊤
,
1 ≤ k ≤ N , where τk = ∆x sinψkc . Following the discussion in Section 7.3,
for a realizable unit TTD tGD, by making the multi-beam look directions to be
ψk = sin
−1 c·k·tGD
∆x
, 1 ≤ k ≤ N , the multibeam network can be represented as given
in (7.15) where α = e−jΩIF τejΩcτ .
WN =









1 α . . . α(N−1)
1 α2 . . . α2(N−1)
...
...
. . .
...
1 αN . . . αN(N−1)









(7.15)
Realization of WN an analog IF requires
N2(N2−1)
4
realizations of the delay elements
(e−jΩIF τkl) and phase compensators (e−jΩcτkl). However, using the same approach in
Section 7.4, the realization of the factorized versionWN , reduces the number of TTD
and phasing elements in the whole N -beam realization and therefore becomes much
SWaPC efficient compared to a direct implementation of WN . The specific case of
N = 4 will be used to show how reduced complexity analog IF circuit realization
can be done to achieve the multibeams in analog CMOS and also how the design
can be extended to get a 9 beam network covering both boresight quadrants of the
array.
154
α(s)
α(s)
φ2(s) = α(s)(α
2(s)− 1)
φ1(s) = α(s)(α(s)− 1)
φ3(s) = α
2(s)(α(s)− 1)
φ4(s) = α
2(s)(α2(s)− 1)
φ5(s) = α
3(s)(α(s)− 1)
ψ (s)
α(s)
ψ (s)
L
3
U
3
U
2
U
1
L
1
L
2
α(s)
α(s)
φ3(s) φ4(s)
α(s)
α2(s)
φ2(s)φ1(s)
φ5(s)
α(s)
P1 P2 P3P0
Ant-1 Ant-2 Ant-3 Ant-4
(a)
(c)
(b) I Q
I Q
G1
G2
G3
Figure 7.8: (a) Overview of the system architecture of the 9 beam multi-beamformer;
(b) signal flow graph for realizing 4 beams having look directions at sin−1 l
4
, l ∈
{1, 2, 3, 4}; (c) analog-IC realization of the α(s) block using APFs.
7.7.2 All-Pass-Filter Based 4-Beam Analog IF Beamformer
The IF multibeam model in (7.15) for N = 4 is given in (7.16). Here, α =
e−jΩIF τe−jΩcτ .
W4 =









1 α α2 α3
1 α2 α4 α6
1 α3 α6 α9
1 α4 α8 α12









(7.16)
From (7.8) its clear that the W4 in (7.16) can be factorized as W4 =
L1L2L3U3U2U1. The factorization is equivalent unless the fact that the α ar-
155
gument is now α = e−jΩIF τe−jΩcτ . Therefore, it can be seen that the 4-beam IF
network will be different than that of the RF implementation in Section 7.5. But
the implementation of Ui’s and Li’s significantly reduces the amount of α(s) blocks
required in the signal flow graph and provide the same gain in complexity reduction
with respect to the direct implementation to realize W4.
Since e−jΩtτ can be approximated by ψ(s) =
(
1−jΩtτ/2M
1+jΩtτ/2M
)M
, where M ∈ Z+,
the ideal unit delay τ can be approximately realized in an analog RC-active topol-
ogy using a cascade of M second-order all-pass filters as described in Section 7.5.
Therefore, the beamforming matrix from sparse factorization of WN can be approx-
imated by an approximately equivalent APF matrix Ψ ≈ WN , where the (k, l)-th
(k, l ∈ [0, . . . , N − 1]) element of Ψ takes the form ψkl(jΩIF ) · e−jΩcτkl and ap-
proximates the transfer function αkl(s) A possible implementation of α(s) is shown
in Fig. 7.8(b). Fig. 7.8(a) shows the the SFG for realizing Ψ from the proposed
factorization in (7.8). Fourth row of the factorization has been further optimized to
reduce the number of delay blocks used. The blocks denoted as ψi(s) in the SFG
where i ∈ {2, 3, 4}, can be realized by cascading i ·M identical realizations of α(s).
As illustrated in Fig. 7.8(b), the outputs of the APFs are followed by a weighted
sum to realize the phase compensation corresponding to e−jΩcτ term required in
IF beamforming. Fixing the multibeam directions to be ψk = sin
−1 1
N
makes the
required unit TTD τ = ∆x
cN
. Since for a critically sampled array ∆x = c
2(fc+B/2)
,
that makes e−jΩcτ = e
−jπ
N
· fc
fc+B/2 . The analog integrated circuit architecture needed
for realizing e
−jπ
N
· fc
fc+B is illustrated in Fig. 7.8(b). Both in-phase and quadrature
signals at the output of the APFs can be scaled three CMOS gains (G1, G2, G3)
according to Gauss complex multiplication algorithm [170] to obtain the complex
phase rotation of the signal by e
−jπ
N
· fc
fc+B/2 . If γ = −π
N
· fc
fc+B/2
, these gains can be
calculated as G1 = cos γ, G2 = sin γ − cos γ, and G3 = sin γ + cos γ.
156
90
0
Ant-4Ant-1 Ant-2 Ant-3
beam 0
P
3
P
2
P
3
P
2
P
1
P
0
A0 A1 A2 A3
Direct
sum
beam
P
1
P
0
A1 A2 A3A0 A0 A1 A2 A3 A3 A2 A1 A0
b
e
a
m
2
b
e
a
m
1
b
e
a
m
4
b
e
a
m
3
b
e
a
m
5
b
e
a
m
6
b
e
a
m
8
b
e
a
m
7
I Q
Figure 7.9: Overview of the system architecture of the 9-beam multi-beamformer.
It has to be noted that the direct implementation of the BFN denoted by Ψ(s)
would require 120 unit TTD and phase compensation hardware blocks. The imple-
mentation through the DVM factorization provides a 60% reduction in hardware
by making it viable to implement the BFN using only 48 TTD elements phase
compensation hardware.
7.7.3 Extending the 4-Beam Algorithm to Generate Simul-
taneous 9-Beams
In this section we propose a 9-beam analog IF beamforming architecture extend-
ing the 4-beam realization in Section 7.5. Due to the causal nature of the TTD
beamforming network, the beams realized (using a ULA) are confined to a single
quadrant of the boresight. For a given TTD beamformer that forms a beam in the
157
direction ψ measured from the array broadside where −90◦ ≤ ψl ≤ 0◦, the order of
the antenna outputs that is fed to a TTD beamformer can be reversed to achieve
a beam at −ψl direction. Therefore, a separate copy of the circuit that realizes
the SFG shown in 7.8(a) can be used with the reversed order of antenna inputs to
create a similar set of beams in the adjacent quadrant. Also, as mentioned in 7.5
the direct sum beam that generates a broadside beam can be generated independent
of the factorization with only using summing circuits. These concepts can be used
to achieve 9-beams all together pointing at non-uniformly spaced directions given
by sin−1
(
l
N
)
for l given by l = 0,±1,±2,±3,±4 and N = 4. Fig. 7.9 illustrates
the architecture of such beamforming network that generates 9-wideband beams in
analog IF.
7.8 Simulated Beams Using Ideal APF Responses
A high bandwidth (up to 16 GHz) APF has been designed in [167] which has a group
delay of 57 ps using 65 nm CMOS. Therefore, realizing a APF that has a matching
group delay for a specific beamformer is possible in CMOS [168, 171]. Wireless
systems that operates at Ka or V bands would require an APF implementations
that realizes a shorter delay (than in [167]). But such filters doesn’t have to operate
at such high RF as the proposed architecture deals with analog IF implementations.
Therefore, design of a CMOS circuit (to approximate the transfer function for the
ideal delay) that can realize an APF with a shorter delay that satisfies the group
delay requirements in a mmW wireless system is assumed. As a proof of concept,
ideal APF response are used to simulate the beams and to verify the functional
correctness of the proposed 9-beam beamformer. A bandpass system in RF with
fc = 60 GHz, B = 10 GHz is assumed where the signals are down-converted to
158
-10 dB
-20 dB
-20 dB
-10 dB
-10 dB
0 dB
-20 dB
-40 dB
-20 dB
0◦
14.5◦
30◦
48.6◦
90◦
(b)
B
e
a
m
1
B
e
a
m
2
B
e
a
m
3
B
e
a
m
4
B
e
a
m
0
ωx
ωx
ωx
ωx
(a)
ωx
Ω
I
F
Ω
I
F
Ω
I
F
Ω
I
F
Ω
I
F
π,B
π,−B
−π,B
π, B
π, B
π,−B
π,−B
π,−B
π,−B
−π.− B
−π.− B
−π.− B
−π.− B
−π.− B
−π, B
−π, B
−π,B π, B
−π, B π, B
2-D magnitude frequency response
(c)
Array factor
0 dB
-10 dB
0 dB
-20 dB
0 dB
0 dB
(f)
(g)
(h)
(i)
π
(d)
(e) (j)
Figure 7.10: Simulated 2-D frequency response of (a-e) beams in the IF stage cor-
responding to beam 0 to beam 4 respectively using ideal APF responses; (f-h) cor-
responding array patterns for temporal frequencies 56, 61 and 65 GHz.
−5 ≤ fIF ≤ 5 GHz. Thus for achieving grating lobes free beamforming in the band
55−65 GHz, the required group delay TGD of the APFs (the minimum TTD needed
in the system) can be calculated as TGD =
∆x
cN
where ∆x is the inter-element spacing
chosen to satisfy Nyquist spatial sampling. Therefore, ∆x = c
2fmax
and thereby
159
tGD =
1
2fmaxN
where fmax is the highest frequency of interest in the system which is
given by fc +B. In such case tGD can be calculated as 1.92 ps. Fig. 7.10 shows the
simulated beam patterns obtained using the factorization of W4. Figs. 7.10 (a-e)
shows the 2-D magnitude frequency of the individual IF beams and the Figs. 7.10(f-
j) depicts the corresponding array factors obtained for different temporal frequencies
f ∈ {56, 61, 65} GHz. Since a mirrored copy of the signal flow graph is used to
achieve multi-beams in ++ and −+ quadrants, beams and the array factors are
only shown for the −+ quadrant.
7.9 Conclusion
Low-SWaP squint-free wideband multibeam beamforming networks have been in-
vestigated. A low-complexity TTD implementation method of wideband N beam
beamforming network has been derived and analyzed. By using the proposed
method, the required number of TTD elements for implementing the N -beam net-
works in both RF and IF can be reduced through the use of sparse factorization of
the DVM. A theoretical analysis of complexity bounds in the proposed algorithm
has been conducted.
To confirm the validity of the proposed method, 4-point DVM has been consid-
ered. Sparse factorization of the DVM W4 that realizes 4-simultaneous wideband
beams at RF reduces the required TTD elements from 60 to 24 achieving 60%
hardware reduction to obtain a low complex realization. The 4-beam beamforming
network has been simulated using measured all-pass filter responses and The mea-
sured results from a CMOS APF have been used to simulate the wideband beams
that have bandwidths exceeding 2.4 GHz.
160
Low complexity, squint-free simultaneous multiple beams are proposed for wide-
band IF beamforming. All-pass filters are employed as the true-time delay elements.
Factorization of the DVM is proposed to reduce the complexity associated with
the beamformer, where each element of the matrix corresponds to the compound
phased compensation. Proposed multi-beamformer provides a 60% reduction in
TTD elements (from 120 to 48) and the required phased compensation hardware.
A simulated example shows squint-free multi-beams for the 55-65 GHz band using
down-conversion to 10 GHz IF.
161
CHAPTER 8
HYBRID BEAMFORMING ARCHITECTURES FOR SQUINT-FREE
OPERATION
This chapter extends the discussion of Chapter 7 to investigate a low-complexity
hybrid (analog-digital) multibeam beamforming architectures that does not suffer
from beam squint. The proposed architectures can be used to achieve low-SWaP
beamforming of wideband 5G signals by employing hardware and power efficient
beamforming at both analog and digital thereby reducing the required number of
ADCs in the array leading to ultra-low power consumption. The analog beamform-
ing architecture is based on the sparse factorization of the N -beam TTD DVM that
was proposed in Chapter 7.4 which can be efficiently realized at RF (mmW) using
analog circuits; in particular as APFs in CMOS. A low-complexity digital beam-
forming method is proposed to perform wideband digital beamforming as the 2nd
level low-dimensional beamforming. Actual circuit response of an APF on analog
45 nm CMOS that has been designed and implemented by one of our fellow collab-
orators in University of Calgary has been used to verify the proposed beamforming
architecture at 28 GHz.
8.1 Hybrid Beamforming Systems
A brief review of digital-beamforming (DBF), analog-beamforming (ABF), and
hybrid-beamforming (HBF) was provided in Chapter 2.5. As mentioned there, DBF
delivers maximum flexibility including multiple beams [172], high dynamic range,
and high accuracy using digital calibration [173, 174]. However, DBF requires one
RF chain and two ADCs per antenna element (assuming I-Q downconversion), i.e.,
P RF chains and 2P ADCs for P antenna elements. This results in high power
consumption because of the large number of ADCs, which are usually the most
162
power-hungry blocks in receivers [40]. Moreover, the real-time signal processing
required to generate beams from the digitized data consumes a large amount of
additional power, making large-scale DBF implementations (e.g., for P = 64 or 128
elements) challenging and impractical at higher frequencies like mmW [175]. Since
wireless systems in 5G and beyond 5G will require large antenna arrays (e.g. with
P = 64 or 128 elements) to achieve high receiver gains due to the greatly reduced
physical array size owing to the decrease in wavelength. Implementing the same
number of transceivers as the number of antennas may not be feasible due to exces-
sive demand on real-time signal processing that results in high power consumption,
and high cost [175].
HBF networks [41] addresses this challenge by combining low-dimensional digital
beamformers (at baseband) with analog beamformers (at RF). A generic overview
architecture of a HBF receiver is shown in Fig. 2.11(c). Such architectures can
achieve performance similar to fully-digital schemes at lower cost and power. They
typically use RF phase-shifters, TTDs, or lenses for level-1 analog beamform-
ing [41–43] to form a beam in all sub-arrays and the beamformed ouputs from
all the subbarrays are sampled to perform level-2 beamforming in digital baseband
processing. Considering the needs of future beamforming systems, beamforming
arrays that can produce squint-free multiple simultaneous beam are of high inter-
est. Therefore, following these ideas, a novel optimized hybrid beamforming scheme
is proposed in this chapter targetting 5G mmW base stations. Fig. 8.1 shows the
overview of the proposed architecture.
Consider a P -element antenna array consisting of L N -element sub-arrays where
P = LN and inter-element spacing ∆x is set to λmin
2
. Here λmin is the wavelength
corresponding to the maximum operating frequency, i.e., fc + B/2 where fc is the
center frequency and B is the signal bandwidth. Each antenna output is followed by
163
A
D
C
A
D
C
A
D
C
A
D
C
A
D
C
A
D
C
A
D
C
1 2 N 1 2 N
IQ
VCO
I Q
R
F
 C
ha
in
R
F
 C
ha
in
A
D
C
Level−1 Beamforming
(Analog)
N21 21 N
1
Analog Mux Analog Mux
R
F
 C
ha
in
R
F
 C
ha
in
Level−2 Beamforming
(Digital)
LNAs
vcnt
ABF1
vcnt
N : M
M M
ABFL
N : M
N
>
M
DBF
(Thiran IIR Filters)
Figure 8.1: System architecture of the proposed hybrid squint-free multi-beam net-
work.
i) a LNA, and ii) level-1 ABFs. The ABFs are proposed to use the wideband squint-
free multi-beam algorithm realized at RF to generate level-1 beams. A analogN :M
multiplexer is then proposed to selectM ≤ N outputs from each ABF. These signals
are I-Q downconverted and amplified by LM parallel RF chains, and digitized by
2LM ADCs. The digitized baseband signals are further processed by a DBF to
generate narrow level-2 beams. Since M < N , the proposed hybrid architecture
reduces the total number of RF chains and ADCs by a factor of M/N compared
to a fully-digital beamformer. This is because the digital system picks M analog
beams from the N available from each subarray for subsequent processing. The
choice of how many are needed is proportional to the capacity of the system. If the
application only requires a single channel, then M can be as small as unity. On the
other hand, for maximum capacity, the system needs to exploit all available beams
(for e.g., if it’s used in a base station or access point). In such a situation, one may
digitize all N beams, so M = N . Therefore, the ratio lies between 1 ≥M/N ≥ 1/N
164
1
2
N
F
ro
m
 a
n
te
n
n
a
s
A
n
a
lo
g
 b
e
a
m
-f
o
rm
e
r
(A
B
F
)
A
n
a
lo
g
 m
u
lt
ip
le
x
e
r
(N
:M
)
LNAs
Receiver
1
chains
2
M
I
Q
I
Q
I
Q
T
o
 A
D
C
s
LO reference
 	 B 

Quadrature
VCOVcnt
LO
buffers
LO
buffers
RF
Quadrature
downconverter
I+
I-
Q+
Q-
IF
amplifiers
Receiver
chain
j
 	 C 

(a) (b)
Figure 8.2: (a) High-level block diagram of the overall RF front-end architecture,
(b) a more detailed block diagram of a single RF chain, (c) simplified schematic of
the proposed LNA.
where M and N are both integers. When a large number of beams is not required,
the number of ADC channels are 2LM ≪ 2LN .
8.2 Circuit Design for Multi-Beam Realization
A block diagram of the overall receiver architecture is shown in Figs. 8.2 (a) and (b).
This section focuses on the circuit architectures of the RF front-ends; the design of
ABF block is discussed in section 8.3. Each RF chain consists of i) a quadrature
frequency downconverter realized using two mixers and a quadrature local oscillator
(LO), and ii) IF amplifiers and low-pass filters. The outputs of the IF amplifiers
are fed into parallel ADCs for digitization. Independent RF chains are used to
process the M parallel outputs of the analog multiplexer. The LOs of these M
receiver channels should be synchronized in order to obtain coherent outputs for
digital beamforming. Thus, the LO duty cycle, skew, and jitter (both random and
deterministic) has to be controlled across the array in the presence of variable routing
length (µm to mm) in the power divider network. In particular, the LO distribution
network often dominates out-of-band phase noise (i.e., at large offset frequencies),
165
making its design critical for broadband systems [176]. There are three main design
approaches: i) distributing a single LO generated by a central PLL; ii) injection-
locking individual LOs in each channel to that of a central PLL, resulting in a
distributed PLL; and iii) distributing a low-frequency reference signal to local PLLs
at each channel. Each approach results in a different trade-off between the power
consumed by LO generation and distribution [177]. The local PLL option provides
good performance, flexibility, and scalability at the cost of the highest power and
area consumption. Thus, the central and distributed PLL options are more suitable
for smaller arrays. Of these two, the central PLL is favored for broadband systems,
for which injection locking becomes less power-efficient.
8.3 Level-1 Analog Multi-Beam Beamformer
The factorization of DVM based analog multibeam architecture discussed in sections
7.3 and 7.4 is proposed for level-1 analog beamforming in the mmW arrays. The
linear phase shift e−jΩtτ associated with WN (or its factorization) can be efficiently
approximated on-chip by CMOS APFs [167,168]. It was shown in Section 7.5 that
WN can be approximately realized by using APFs as building blocks. Moreover, it
was discussed that the beamforming matrix from sparse factorization of WN can be
approximated by the APF network matrix ΨN ≈ WN . Here, an APF implemented
using 45 nm CMOS technology is used to realize ΨN . A direct approach to imple-
menting ΨN would require cascading of l · k such APFs for realizing each (l, k)-th
node. Subsequently, an example realizations of level-1 beamforming networks for
N = 4 and 8 are analyzed and compared using the proposed DVM factorization.
166
G
ai
n
P
ha
se
 [r
ad
]
Frequency [GHz]
(b)
(c)
(a)
Figure 8.3: (a) APF schematic; its (b) gain and (c) phase profile.
8.3.1 28-GHz Current-Mode CMOS All-Pass Filter
The 28 GHz CMOS APF that is used to simulate the beams of the proposed HBF
is a current-mode single-transistor circuit which is shown in Fig. 8.3. The APF
was designed by Dr. L. Belostotski at University of Calgary and the original work
can be found in [171]. Ignoring parasitic poles and zeros due to Cgs and Cgd and
assuming gm ≫ gds, the transfer function of the circuit is given in equation (8.1).
H(s) =
vout
vin
= −R2(R1gm − 1)
R1 +R2
·
1− s Lsgm
R1gm−1
1 + sLsgm
, (8.1)
where gm is the transconductance of M1, Ls is a source degeneration inductor, R1
is a feedback resistor between gate and drain of M1, and R2 is a load resistor to
convert current to voltage. For (8.1) to represent an APF, the left-plane pole and
right-side zero should have the same frequencies. This is achieved in [171] by setting
R1gm − 1 = 1, resulting in
vout
vin
= − R2
R1 +R2
· 1− sLsgm
1 + sLsgm
. (8.2)
The pole and zero frequencies are ωp1 = −ωz = − (Lsgm)−1, the phase response
is φ(ω) = −2 tan−1 (ω/|ωp1,z|) and the group delay td = 2ωp1,z ·
ω2
(p1,z)
ω2
(p1,z)
+ω2
where ω
167
is the angular frequency and ωp1,z denotes the pole or the zero frequency. At low
frequencies td ≈ 2/ωp1,z, and the term
ω2
(p1,z)
ω2
(p1,z)
+ω2
captures the high-frequency td
dispersion associated with all first-order APFs. Once the parasitic parameters of
M , which mainly include Cgd and Cgs, are considered, high frequency poles will
be generated. According to [171], the small-signal analysis confirms a dominant
parasitic pole at ωp2 = − (R1Cgd)−1 and to reduce the effect of ωp2 on the APF, R1
is kept small and large gm is used to ensure ωp1 = −ωz. Large gm has been achieved
by providing a large overdrive voltage to M1 while keeping M1’s size fixed, which
increases ωp2 while keeping Cgd constant.
8.3.2 Simulated 4- and 8-Beam Networks using 28 GHz
APF Measured Responses
This section will use the DVM factorization that is proposed in the Section 7.4
along with the 28 GHz measured APF responses to simulate and analyze the low-
complexity beam outputs. The DVM factorization for the 4-beam case (W4) is given
by (7.8). The signal flow graph corresponding to the factorized implementation for
this case will also be similar to that is illustrated in Fig. 7.3 unless the fact that the
ψ(s) response now corresponds to the simulated responses of the 28 GHz APF given
in Fig. 8.3. The unique look-directions θks of this beamformer where 1 ≤ k ≤ N ,
are a function of both APF group delay td and antenna spacing ∆x. For a given td
and ∆x, the look-directions of the beams are given by θk = sin
−1 c·k·td
∆x
, 1 ≤ k ≤ N .
As mentioned in Section 7.5, the SFG in Fig. 7.3 uses only 24 APF blocks. A
similar 4-beam network using a direct synthesis of TTD phased array would involve
60 such blocks. Thus the proposed approach achieves a 60% reduction in hardware
complexity.
168
(a) (b)
(c) (d)
Figure 8.4: Simulated beam responses resulting form the SFG in Fig. 7.3 using the
28 GHz APF simulated frequency responses at frequencies {26.4, 27.6, 29.7} GHz.
Beams (a-d) have look-directions 11.1◦, 22.6◦, 35.2◦, and 50.2◦, respectively.
To simulate the SFG shown in Fig. 7.3, a 2 GHz signal bandwidth around a
f0 = 28 GHz carrier and an antenna spacing ∆x = λmin/2 to eliminate grating
lobes are assumed. The APG group delay required for the beams to be within φ◦
from array broadside is td =
∆x sin−1{φ}
c·N
. The required delay was first estimated using
this relationship taking φ = 50◦, which leads to td ≈ 3.2 ps for N = 4, resulting
in four beams directed at 11.1◦, 22.6◦, 35.2◦, and 50.2◦. Next, the APF design in
Section 8.3.1 was tuned in Cadence to obtain the desired group delay over the 26-
30 GHz range. Average group delay was calculated as 3.202 ps using the Cadence
simulated data. The simulated APF frequency response was exported to MATLAB
to simulate beam responses resulting form the SFG in Fig. 7.3. Fig. 8.4 shows the
array factors corresponding to beam outputs Y [1], ..., Y [4].
The SFG of the factorized stages for W8 is shown in Fig. 8.5 (same conven-
tions used as in Fig. 7.3). This network only requires 224 APF blocks, whereas
169
Y[1]
Y[2]
Y[3]
Y[4]
Y[5]
Y[6]
Y[7]
Y[0]
x[5]
x[6]
x[7]
x[4]
x[3]
x[2]
x[1]
x[0]
φ15(s) = ψ
3(s)(ψ3(s)− 1)
φ14(s) = ψ
3(s)(ψ2(s)− 1)
φ11(s) = ψ
2(s)(ψ5(s)− 1)
φ12(s) = ψ
2(s)(ψ6(s)− 1)
φ13(s) = ψ
3(s)(ψ(s) − 1)
φ17(s) = ψ
3(s)(ψ5(s)− 1)
φ20(s) = ψ
4(s)(ψ3(s)− 1)
φ16(s) = ψ
3(s)(ψ4(s)− 1)
φ18(s) = ψ
4(s)(ψ(s) − 1)
φ19(s) = ψ
4(s)(ψ2(s)− 1)
L(1)L(2)L(3)L(4)L(5)L(6)L(7)U (7)U (6)U (5)U (4)U (3)U (2)
ψ(s)
ψ(s)
ψ(s) ψ(s) ψ(s) ψ(s) ψ(s)
ψ
(s
)
ψ
2 (
s
)
ψ
2 (
s
)
ψ
3 (
s
)
ψ
3 (
s
)
ψ
3 (
s
)
ψ
3 (
s
)
ψ
4 (
s
)
ψ
4 (
s
)
ψ
4 (
s
)
ψ
5 (
s
)
ψ
5 (
s
)
ψ
6 (
s
)
ψ
(s
)
ψ
(s
)
ψ
(s
)
ψ
(s
)
ψ
(s
)
ψ
2 (
s
)
ψ
2 (
s
)
ψ
2 (
s
)
φ1(s)
U (1)
φ2(s)
φ3(s)
φ7(s)
φ8(s) φ13(s)
φ15(s)
φ1(s) = ψ(s)(ψ(s) − 1)
φ2(s) = ψ(s)(ψ
2(s)− 1)
φ3(s) = ψ(s)(ψ
3(s)− 1)
φ4(s) = ψ(s)(ψ
4(s)− 1)
φ5(s) = ψ(s)(ψ
5(s)− 1)
φ22(s) = ψ
5(s)(ψ(s) − 1)
φ25(s) = ψ
6(s)(ψ(s) − 1)
φ23(s) = ψ
5(s)(ψ2(s)− 1)
φ24(s) = ψ
5(s)(ψ3(s)− 1)
φ21(s) = ψ
4(s)(ψ4(s)− 1)
φ26(s) = ψ
6(s)(ψ2(s)− 1)
φ6(s) = ψ(s)(ψ
6(s)− 1)
φ7(s) = ψ
2(s)(ψ(s) − 1)
φ9(s) = ψ
2(s)(ψ3(s)− 1)
φ8(s) = ψ
2(s)(ψ2(s)− 1)
φ10(s) = ψ
2(s)(ψ4(s)− 1)
φ27(s) = ψ
7(s)(ψ(s) − 1)
φ23(s)
φ26(s) φ27(s)
φ25(s)φ16(s)
φ17(s)
φ18(s)φ14(s)φ9(s)φ4(s)
φ5(s)
φ6(s)
ψ(s)
φ19(s)
φ20(s)
φ22(s)
φ24(s)φ21(s)
φ11(s)
φ10(s)
φ12(s)
Figure 8.5: SFG for the proposed low-complexity 8-beam wideband beamformer.
170
(d)(c)
(a) (b)
Figure 8.6: Monte-Carlo simulations for each beam of the N = 4 network (50
runs). The red curve represents the beam pattern for the nominal value of APF
gain (unity).
a direct 8-beam network would require 1008 blocks. As for the N = 4 case, td
of the APF was selected such that all 8-beams are within 50◦ from array broad-
side; this required td ≈ 1.6 ps. The tuned APF circuit as simulated in Cadence
had an average group delay of 1.606 ps which ideally should generate beams at
5.5◦, 11.1◦, 16.7◦, 22.6◦, 28.7◦, 35.2◦, 42.2◦, and 50.2◦. The simulated gain and
phase values of the APF tuned to td = 1.60 ps are shown in Fig. 8.3 (b) and
(c) respectively. The same procedure as described for N = 4 was followed. Fig.
8.10 shows the simulated results for beams 2, 4, 6, and 8 in two cases: Figs. 8.10 (a)
and (b) show 2-D frequency magnitudes and array factors (AFs) using ideal delays,
while Figs. 8.10 (c) and (d) show similar plots using simulated APF data. The two
sets of simulated beams are in good agreement where the maximum beam deviation
was < 0.8◦ with respect to the direction corresponding to the use of ideal delay. Note
that while the algorithm provides N beams, the direct sum beam (look-direction to
array broadside) only requires summing circuits. Also, beams pointing to −θks can
use an identical network with reversed antenna inputs.
171
8.3.3 Analysis of Variations in Beams due to Circuit Imper-
fections
Depending on the technology, all APF blocks that are used to construct the beam-
forming network will not be identical due to PVT variations; in practice, gain mis-
matches of up to 10% are expected between them. A probabilistic approach was
taken to analyze the impact of such mismatch on the final beam outputs. The
gain variation of the APF blocks was assumed to be distributed uniformly between
unity and the maximum error margin. Monte-Carlo simulations were conducted
using these randomly-distributed gains to see how the beam shapes were affected.
Fig. 8.6 shows that the simulated beam patterns of the 4-beam network are relatively
robust to the assumed amount of mismatch.
8.4 Level-2 Digital Beamforming
The goal of the level-2 DBF is to generate narrow beams that maximize the link
budget. The N : M analog multiplexers after the ABF dynamically select the
level-1 beam(s) fed into the DBF. Since the objective is to produce squint-free
beams, delay-and-sum beamforming (that uses TTD) which is wideband in digital
is required. Each mth, m ∈ [1,M ] output beam from the L sub-array ABFs creates
a spatially undersampled input to the DBF and thus, it’s array factor will contain
grating lobes. The latter are removed by the ABF array factors, thus allowing the
hybrid system to generate sharp output beams.
Different digital filter delay-and-sum structures are proposed in literature as
discussed in Chapter 2. FIR fractional delay approximation filters are generally
used for this [178]. For this work, the use of Thiran digital APFs is proposed
172
FIFO buffer
Fractional delay approximation
using FIR filters
Wideband antenna array
RF front−end
Beamformed output
α0
α1
α2
z−1
z−1
z−1
z−1 αβ
Figure 8.7: Wideband digital beamforming using FIR fractional delay approxima-
tion filters.
to approximate the required TTD and digitally beamform the signals (beams) from
each analog sub array [179]. The Thiran APF is a IIR filter that can approximate the
required TTD with a much lower hardware complexity than FIR filter counterparts.
Ideally, M number of such digital filters would be needed to process each of the M
beams sampled in to the digital.
Consider an L ∈ Z+ input digital TTD beamforming filter. In such filter, the
signals at each filter input is needed to be delayed by Tl amount at the l
th, l ∈ [1, L]
input where,
Tl =
(l − 1)∆d sinψ
c
, (8.3)
and c is the wave speed, ∆d is the physical distance between two adjecent elements
and ψ is the angle of the beam formed. Filter and sum beamformers approximate
the required true-time delay Tl using a high order FIR filter. Wideband delay and
sum beamformers realize integer and fractional delays (compared to the sample pe-
riod of the digital system) using separate sub systems. Delays which are multiples
of the sampling period is obtained via clocked registers. Fractional delays are ap-
proximated via numerical methods implemented as digital FIR filters. Consider
the system shown in Fig. 8.7, where each element is followed by a first-in first-out
173
(FIFO) memory and an FIR filter. The time delay Tl
Tl = ilTs + τl, (8.4)
where Ts is the sampling period, ilTs is an integer multiple of the sample delay with
il ∈ Z+. τl is a fractional delay where 0 ≤ τl ≤ Ts.
The accuracy of the fractional delay approximation (set by the FIR filter coeffi-
cients) determines the wideband performance of the beamformer [180]. Frequency
response of the ideal fractional delay system with a delay of τl can be expressed as,
Hl
(
ejω
)
= e−jωτl , (8.5)
which has an unity magnitude response and a phase response of −ωτl. Correspond-
ing impulse response can be approximated by,
hl[n] = sinc (n− τl/Ts) . (8.6)
Windowing methods, such as the Kaiser window and the Chebyshev window, can
be employed to obtain the FIR filter coefficients [181,182]. The Lagrange interpola-
tion method yields maximally flat approximation of the fractional delay for the FIR
implementation [180]. In order to scan the environment by producing an electron-
ically steerable RF beam, filter coefficients need to be changed accordingly. Such
applications require tunable fractional delay filters, which may be realized using re-
computation of coefficients, lookup tables and polynomial coefficient approximation
(Farrow structure) [180, 183].
8.4.1 Thiran APFs
Thiran filters are a special class of low-complexity IIR filters that have similar nu-
merator and denominator coefficients, but in reverse order. They are the recursive
174
counterpart of the FIR Lagrange interpolation method, which provides maximally
flat group delay at zero frequency [184]. Thiran fractional delay filters thus re-
alize APFs with maximally flat group delay compared to other fractional delay
implementation methods [180]. A significant gain in hardware complexity can be
achieved without compromising accuracy by replacing FIR fractional delays with
Thiran APFs [179]. Their transfer function is,
HT (zct) =
z−βct Q (1/zct)
Q (zct)
, (8.7)
where Q (zct) =
∑β
i=1 αiz
−i
ct and zct = e
−jωct is the temporal z domain variable, and
β is the filter order. Thiran APF coefficients can be expressed in closed form as
αi = (−1)i
(
β
i
) β
∏
n=0
D − β + n
D − β + i+ n ; i = 0, 1, . . . , β (8.8)
which approximates the required delayD, where β is the filter order and
(
β
i
)
= β!
i!(β−i)!
are binomial coefficients. Also α0 = 1, and stability requires D > β − 1 [180, 184].
8.4.2 Digital Implementation of the Thiran Filter based
Beamforming
The architecture of the proposed wideband beamformer is shown in Fig. 8.8. The
2D z domain transfer function of the Thiran beamformer can be expressed as,
H (zx, zct) =
L−1∑
l=0
H lT (zct) z
−(L−l)
x z
−il
ct (8.9)
where L is the number of spatial input channels, zx is the spatial z domain variable
and H lT (zct) is the Thiran filter transfer function at the l
th element, which realizes
the required fractional delay τl (i.e., the actual delay in the system is (β − 1)Ts+ τl
where Ts is the sampling period of the digital system).
175
FIFO buffer
Fractional delay approximation
using Thiran filters
Beamformed output
z−1
z−1
z−1
αβ
α0
α1
αβ
αβ−2
αβ−1
Figure 8.8: Proposed wideband delay-sum beamforming architecture using Thiran
all-pass fractional delay filters.
The τl of each element is realized using a β order Thiran filter. Note that the
Thiran filter provides additional integer number of sample delays Di instead of the
fractional delay τl which leads to a total delay of DlTs + τl. Here, the integer delay
Dl ≥ β − 1 due to the stability condition and Dl ∈ {β − 1, β}. The Thiran filter
provides better performance in terms of the flatness of the group delay when the
necessary delay is less than the filter order (i.e. Dl < β) [180]. Thus, Di is selected
as β − 1 in the proposed beamformer, which leads to (β − 1)Ts + τl delay from the
Thiran filter instead of the required fractional delay τl. Since all branches delay the
same amount of β−1 sample delays, this does not affect the required total delay Tl,
other than the final delay at the lth element becomes (β − 1)Ts+Tl. Required integer
sample delay ilTs is obtained similarly as the standard delay-sum beamformer by
tapping off the correct delay position from the FIFO buffers.
To efficiently calculate the filter coefficients in real time, the relationship,
αi = αi−1
(N − i) (N − i− τk)
(i+ 1) (i+ 1 + τk)
(8.10)
176
−π
0
ωct
ωx π
-0.8π
0.8π
−π
1
0
(a) (b) (c)
Figure 8.9: (a) 2D frequency response of the Thiran filter based 32-element beam-
former where beam angle is set to 40◦; (b) magnitude response in (a) of the beam-
formers along the line shaped passband against the temporal frequency; (c) the array
factors of the Thiran filter based simulated beamformer tuned at 40◦ at different
temporal frquency values. .
can be used instead of the equation given in (8.8). By setting zx = e
jωx and zct =
ejωct in (8.9), the 2D space-time frequency domain transfer can be obtained as,
HT
(
ejωx , ejωct
)
=
Nx−1∑
k=0
HkT
(
ejωct
)
e−jωx(Nx−k)e−jωctik . (8.11)
Here, ωx and ωct are the spatial and temporal domain frequencies, respectively.
To illustrate the frequency response of a Thiran filter based beamformer the 2D
frequency response of the HT (e
jωx, ejωct) for Nx = 32 that is Nyquistly spaced
and the where the beam direction is set to 40◦ is shown in Fig. 8.9(a). It can be
seen that the passband of the beamformer is line-shaped oriented at θ angle to ωct
axis where tan θ = sinψ and ψ is the beamforming angle. Therefore, the Thiran
beamformer represents wideband nature which does not suffer from beam squint.
Fig. 8.9(b) shows the magnitude response of the beamformers along the line shaped
passband and Fig. 8.9(c) shows the array factors of the Thiran filter based simulated
beamformer for different temporal frequency values.
The TTD delay Tl in (8.3), for the m
th digital beamformer processing the mth
beam from the ABF stage can be calculated by setting δd = N ×∆x and ψ = ψm
177
frequency response
2−D ideal magnitude
using APF data
2−D magnitude frequency Array factorArray factor
be
am
 2
be
am
 4
be
am
 6
be
am
 8
(a) (c) (d)(i)
(ii) (a) (c) (d)(b)
(b)
1
π,−Ωm
−π,−Ωm
π,Ωm
Ωct
ωx
0
1
π,−Ωm
−π,−Ωm
π,Ωm
Ωct
ωx
0
1
π,−Ωm
−π,−Ωm
π,Ωm
Ωct
ωx
0
1
π,−Ωm
−π,−Ωm
π,Ωm
Ωct
ωx
0
1
π,−Ωm
−π,−Ωm
π,Ωm
Ωct
ωx
0
1
π,−Ωm
−π,−Ωm
π,Ωm
Ωct
ωx
0
1
π,−Ωm
−π,−Ωm
π,Ωm
Ωct
ωx
0
ωx
1
π,−Ωm
−π,−Ωm
π,Ωm
Ωct
ωx
0
−4π,−π 4π,−π
4π, π−4π, π
ωx
ωct
22.6◦
11.1◦
35.2◦
50.2◦
-10dB
0dB
-20dB
-10dB
0dB
-20dB
-10dB
0dB
-20dB
-10dB
0dB
-20dB
-10dB
0dB
-20dB
-10dB
0dB
-20dB
-10dB
0dB
-20dB
-10dB
0dB
-20dB
-10dB
0dB
-20dB
-10dB
0dB
-20dB
-10dB
0dB
-20dB
Figure 8.10: (i-a) 2-D magnitude frequency domain plots of Hl(e
jωx,Ωt) for l =
2, 4, 6, and 8 generated by the proposed 8-beam wideband beamformer assuming
ideal time delays, and (i-b) the corresponding AFs. (i-c) and (i-d): Same as (i-a)
and (i-b) but using simulated APF data. 2-D magnitudes (ii-a) 2-D magnitude
frequency domain plot of the digital filter transfer function tuned to θ5 = 28.7
◦;
(ii-b) AF of the DBF in (ii-a) for different IF frequencies; (ii-c) AF of the 5th beam
of the ABF; (ii-d) composite AF obtained by combing the ABF and DBF.
where N is the number of antenna elements in the sub array and ψm is the direction
of the mth beam from the array.
178
8.4.3 Analog Digital Hybrid Simulations
A 32-element (P = 32) system with four 8-element ABFs (N = 8, L = 4) is assumed
for the simulation purpose. The effective DBF element spacing is N ×∆x, and its
2-D magnitude frequency response using a 3rd order Thiran filter (connected to the
5th ABF beam, which points at θ5 = 28.7
◦) is shown in Fig. 8.10 (ii-a). Fig. 8.10 (ii-
b) shows the DBF array factor for different IF frequencies. Fig. 8.10 (ii-c) shows the
array factor of the 5th beam generated by the level-1 sub-array ABF, while Fig. 8.10
(ii-d) shows the resultant hybrid array factor, i.e., the combination of level-1 and
level-2 beamformers.
8.5 Conclusion
A hybrid beamforming architecture is proposed for emerging 5G wireless commu-
nication systems requiring a variable number of sharp steerable mmW beams. The
beam sharpness is proposed to be achieved in a power- and circuit-optimal way
using the hybrid combination of both mmW-analog as well as digital beamforming
via sub-arrays. The analog beamforming is proposed to use the DVM multibeam
algorithm for obtaining squint-free RF beams. The analog sub-arrays support from
1 to N fixed TTD wideband mmW beams, depending on required capacity, which
makes the architecture flexible for use in both mobiles and base stations. The analog
beamforming is simulated using a Cadence APF model at 28 GHz. The gain and
phase responses of the CMOS APF has been used to simulate the DVM SFG for
both 4-element and 8-element sub-arrays and the the corresponding beam responses
have been reported. The proposed hybrid beamformer contains several analog cir-
cuit components that are subject to on-chip PVT variations. The resulting delay
and gain shifts in each receiver cause errors in the spatial orientation of the beams.
179
Calibration methods to compensate for such shifts include variable-gain amplifiers
for gain, tunable APFs for delay, and reference input signals for training the algo-
rithm. Development of such methods and associated circuitry are beyond the scope
of this work.
Digital beamforming is proposed to achieve via low-complexity digital delay-
and-sum beamformer based on Thiran IIR fractional delay filters. Thiran filters
provide the same performance as FIR fractional delay filters in much lower multiplier
complexity. The Thiran filter based beamforming strategy is analyzed. Finally,
digital beamforming example is simulated along with the beam responses from the
ABF stage.
180
CHAPTER 9
GENERATION OF N SIMULTANEOUS BEAMS AT O(N logN)
COMPLEXITY IN UNIFORM CIRCULAR ARRAYS
Most of the research on smart antennas, MIMO arrays have primarily concen-
trated on rectilinear arrays of either ULAs or URAs. The spectral properties of the
planar waves that are captured by such arrays were discussed in detail in Chapter
2 and different beamforming algorithms and real-time implementations targeting
ULAs/URAs were discussed in the previous chapters. Having mentioned that, only
a little attention has been payed to circular array geometries despite their number
of advantages. Arrays with circular geometry can be seen commonly in applications
like Wi-Fi access-points [185] and radar front-ends [186, 187]. Although literature
on single beam synthesis in UCAs exits, surprisingly, there is limited work available
which discusses multibeam generation algorithms using circular array that does not
involve O(pN) hardware complexity where N is the number of antenna elements in
the array and p is the number of beams generated. Therefore, this chapter will inves-
tigate low hardware complexity methods of generating multiple simultaneous beams
using UCAs. The simultaneous beams are preferable in most cases to be uniformly
spaced in the angular domain, around the circumference of a circular array of ele-
ments, and be electronically adjustable in both the planes. The primary application
of such uniformly spaced multibeam circular array processing systems is expected
to be in ceiling mounted wireless network access points, 5G base-stations, and other
location-based tracking systems that require a full 360 degree field of view. UCAs
have a greater potential in being used by low-cost millimeter wave nodes as UCA
geometries can provide the antenna gain while being able to electronically steer in
the 3D space with much low complexity than URAs.
181
y
x
φn
θ
z
A0
φ
AN−1 A1
Figure 9.1: Circular array signal model.
9.1 Review of Array Factors of Circular Arrays
Circular arrays fall into the category of 1D linear arrays but in circular form. Unlike
linear arrays, circular arrays can provide a 2D angular scan, both azimuthal (φ) and
elevation (θ) angles. The ability of being able to scan horizontally in full 360◦ is a
significant plus point of circular arrays over linear arrays with no distortions near
the end-fire directions [188]. Also, distortions in the array pattern of a circular array
due to mutual coupling effects are same for each element and this makes it easier to
deal with the mutual coupling.
Consider a circular array ofN -elements. Let the inter element angular separation
of the elements be 2π/N and the radius of the circular array be a. The nth element
of the array is at φ = φn where 0 ≤ n < N . Such setup is shown in Fig. 9.1. Let
x = [x0(t), x1(t), . . . , xN−1(t)]
⊤ be the representation of received (or transmitted)
narrowband signal vector of the circular array either at RF (xn(t) ∈ R) or baseband
(xn(t) ∈ C)) where xn(t) is the signal at the nth element. If w as given in (9.1)
denotes the complex weighting vector applied to output array signal vector x,
w = [α0, α1, . . . , αN−1]
⊤ , (9.1)
182
dB
(a) (b) (c)
Figure 9.2: (a) 3D Beam pattern generated to point the beam at φ = 30◦ and
θ = 30◦ by setting the array weights as given in (9.3). (b) Elevation plane pattern
at φ = 30◦; (c) azimuthal plane pattern at θ = 30◦.
and αn ∈ C is the weight at the nth antenna output, then a beam output y(t) is
realized in the far-field by implementing the operation y = w⊤x. The array factor
A(φ, θ) generated from the array corresponding to the weights in w (assuming omni-
directional antenna elements) is given by (9.2) [44],
A(φ, θ) =
N∑
n=1
ej[ka sin θ cos(φ−φn)+αn]. (9.2)
The term k here is the wave number which is equal to 2π/λ where λ corresponds to
frequency of operation. By observing (9.2) it can seen that the maximum sensitiv-
ity/radiation direction (φmax, θmax) is achieved when (9.3) is satisfied.
αn = ±2mπ − ka sin θmaxcos
(
φmax −
2πn
N
)
, m ∈ Z. (9.3)
One possible method to achieve maximum sensitivity at (φmax, θmax) direction is
to compute αn such that m is chosen to make αn ∈ [0, 2π) [44]. Fig. 9.2 shows a
numerically simulated array factor for a 32-element UCA in θ and φ planes where
the desired maximum direction is set to φmax = 0
◦ and θmax = 30
◦.
183
9.2 Generation of N Beams using a UCA
As described, future wireless access points serving femto/pico cells in mmW and
above frequencies will imperatively need shaper steerable beams and preferably
multiple simultaneous beams uniformly spaced in the angular domain around the
circumference while being able to electronically steer the beam in both elevation and
azimuth planes. Following (2.25), producing N beams requires realizing N complex
weighting vectors. As described in (2.25), this involves realizing the linear transform
in (9.4),
y = WNx, where, (9.4)
WN = [w0,w1, . . . ,wN−1]
⊤ (9.5)
is an N×N matrix where each wi can be computed as given in (9.1) and (9.3). The
vector y here represents the beam outputs where y = [y0(t), y1(t), . . . , yN−1(t)]
⊤ ∈
CN×1.
For obtaining equally spaced symmetric beams in the φ direction (azimuthal
plane), each ith row of the matrix WN becomes a circular shift of the (i−1)th row,
and thus WN takes the form of a circulant matrix [189] as given in (9.6).
WN =









α0 α1 . . . αN−2 αN−1
αN−1 α0 . . . αN−3 αN−2
...
...
. . .
...
...
α1 α2 . . . αN−1 α0









(9.6)
Direct realization of the WN in (9.6) consumes O(N2) of hardware complexity.
But the circulant matrices have a special property where they can be diagonalized
by a DFT. This follows as a result of the eigen values of circulant matrices being
184
equal to Fourier modes. Although DFTs are also of O(N2) complexity when com-
puting in their raw form, the FFT algorithms drastically reduce this complexity to
O(N logN). This fact is explored in this chapter to propose a novel low-complexity
circuit architecture for generating N circular symmetric simultaneous beams that
are uniformly spaced in the azimuthal direction using an N -element UCA.
9.2.1 Proposed N-Beam Algorithm
Since WN of our interest is a circulant matrix, the expression in (9.4) can be written
as a a circular convolution give in (9.7)
y = w ⋆ x, (9.7)
where w is the the first column of WN . The vectors w, x and y are cyclically
extended in each direction. Referring to the circular convolution theorem [189], the
DFT can be used to transform the cyclic convolution into component-wise multipli-
cation where,
FN(w ⋆ x) = FN(w) ◦ FN(x) = FN(y), (9.8)
in which the FN(·) denotes the N -point DFT operation and ◦ denotes the Hadamard
product. Thus, the N -beam output vector y can be calculated as,
y = F−1n {FN(w) ◦ FN(x)} (9.9)
In other words, this computation yields an orthogonal decomposition of WN such
that,
WN = F
−1
N DFN , (9.10)
where FN is the N−point DFT matrix, and D is a diagonal matrix containing
diagonal coefficients s.t. D = diag{FN(w)}.
185
9.2.2 Complexity Analysis of the Proposed N-Beam
Algorithm
The computation of the DFT via a vector-matrix product with FN using the FFT
and the inverse DFT computation involving F−1N via the inverse FFT transform,
each has a computation complexity of O(N logN). Therefore, counting the ma-
trix multiplication with diagonal D as N -multiplications, the entire algorithm has
multiplication complexity O(N logN). The complexity of the direct computation of
WN would have been O(N2). The use of the fast algorithms based on FFTs would
therefore lead to a saving of the order of N−logN
N
. For N = 8, the brute-force compu-
tation leads to a multiplier count (real) of 192. On the other hand, since an 8-point
CooleyTukey FFT involves a real multiplier count of 48 [81, p. 76], the total real
multipliers needed for the proposed algorithm to produce 8-beams is 120. This is
a 37.5% reduction of multiplier circuits needed. For a 16-element UCA, this saving
climbs up to 60% (counting 128 real multipliers per 16-FFT [81, p. 76]) and the
hardware saving becomes much more significant for larger array sizes (N). Following
Table 9.1 summarizes the multiplier counts required for proposed algorithm based
multibeam synthesis and the direct computation approaches for N = 8, 16, 32
cases.
Table 9.1: Multiplier counts required for proposed algorithm based multibeam syn-
thesis and the direct computation approaches
N
Multipliers required–
direct implementation
Multipliers required–
proposed method
Saving
8 192 120 (48 per 8-point FFT [81, p. 76]) 37.5%
16 768 304 (128 per 16-point FFT [81, p. 76]) 60%
32 3072 960 (320 per 16-point FFT [81, p. 76]) 69 %
186
9.3 Proposed Hardware Realization Architectures
The algorithm in (9.9) can be implemented in different approaches. Fig. 9.3 shows
the overview architectures of analog and digital realization approaches of the algo-
rithm.
9.3.1 RF Analog Architecture
The analog implementation as shown in Fig. 9.3(a) would require a analog Butler
matrix [57] type implementation that implements the spatial DFT operation in ana-
log as the first stage. The outputs of the Butler matrix then shall be phase rotated
and gain adjusted depending on the γi complex constant where γi = FN(w)[i].
Since γis vary depending on i, a variable gain amplifier in series with a variable
phase shifter architecture can be used in an actual implementation. Alternatively,
a variable group delay APF circuit architecture which has a programmability of the
gain could be used for such implementation. Once the FFT outputs are scaled with
γi coefficients, then a spatial IDFT operation has to be performed across the scaled
output to produce the beam outputs. Since, Butler matrix is a passive reciprocal
network, a Butler matrix network can be used in its so called “transmit” (or the
reciprocal) mode to achieve this.
9.3.2 Digital Architecture
Fig. 9.3(b) shows the digital implementation architecture of the proposed algorithm.
A conventional direct-conversion receiver architecture is shown in the figure to obtain
N IQ baseband received signals from a UCA. The sampled N spatial IQ signals from
the array are then subjected to a digital N -point FFT core and each output bin of
187
(b)
(a)
I
0
90
LO
A
D
C
A
D
CQ
γN−1
b
e
a
m
-(
N
−
1
)
b
e
a
m
-0ANT-0
ANT-(N − 1)
x0
xN−1
γ0
xI,0
xQ,0
xI,N−1
xQ,N−1
ANT-0
ANT-(N − 1)
N
-p
o
in
t
F
F
T
N
-p
o
in
t
IF
F
T
γ0
γN−1
b
e
a
m
-0
b
e
a
m
-(
N
−
1
)
N
-p
o
rt
B
u
tl
e
r
M
a
tr
ix
N
-p
o
rt
B
u
tl
e
r
M
a
tr
ix
Figure 9.3: (a) Analog-RF architecture for realizing the proposed multibeam beam-
forming system. (b) Digital baseband architecture for realizing the proposed algo-
rithm.
the FFT is scaled by the set of complex constant γi (0 ≤ i < N − 1) using a set of
complex multipliers. Here, γi = FN(w)[i]. Each output frame is then subjected to
an N -point IFFT operation to produce N simultaneous beam outputs that will be
equi-spaced in the azimuthal plane. Let x be the N -point input vector to the DFT
188
(-) (-)
(-)
xI,N−1
xQ,N−1
xI,0
xQ,0
N
-p
o
in
t
F
F
T
γ0
γN−1
N
-p
o
in
t
F
F
T
(-)
b
e
a
m
-0
b
e
a
m
-(
N
−
1
)
Figure 9.4: Digital circuits for realizing the N -beams using a UCA.
and y be the output N -point vector. Since the DFT matrix FN is a unitary matrix
(with the normalization by a factor of N), it obeys the following relationship,
FNF
H
N = NIN , (9.11)
∴ F−1N =
FHN
N
. (9.12)
Thus the IFFT operation y = F−1N · x can be expressed as y = 1NFHN · x. The
conjugation of the above relationship yields y∗ = F⊤N · x∗ and since F⊤N = F by the
definition of the DFT matrix, y∗ = FN
N
· x∗. This shows that the FFT core can be
used to perform the IFFT by changing the input vector to its complex conjugate
and then performing the FFT operation on the modified input vector with the
appropriate normalization by N to produce the complex conjugate of the actual
anticipated IDFT output corresponding to the original input. The normalization
required here is trivial operation in digital for N that are powers of two. This
implies that the N -point IDFT operation can be performed using the N -point DFT
operation and therefore the same digital FFT core thats used for obtaining the DFT
can be configured to perform the IDFT computation. The overview of such digital
architecture is shown in Fig. 9.4. The αi coefficients shown in the diagram would be
189
x
y
z
x
y
z
x
y
z
x
y
z
x
y
z
x
y
z
x
y
z
x
y
z
Figure 9.5: Simulated circular beams resulting from the proposed N -beam algo-
rithm. Simulation assumes N = 16 and the beams are equally spaced at 22.5◦ in
the azimuthal plane (only even indexed 8 beams are shown here out of 16).
required in an actual hardware implementation to achieve narrowband calibration
of receivers.
9.4 Simulated N-Beam Patterns
The proposed N -beam algorithm was simulated in Matlab assuming a 16-element
UCA spaced at 0.5λ. The coefficient vector w in (9.1) was calculated using
190
Figure 9.6: Comparison of array-factors corresponding to fixed-point digital core
computed beams and Matlab floating-point simulated beams. The x-axis represents
φ [◦] and y-axis correspond to the beamforming gain.
θmax = 30
◦. The 3D beam plots in Fig. 9.5 show the even indexed 8-beams out
of the 16 equi-spaced beams spaced at 22.5◦ in the azimuthal plane. The digital
architecture in 9.4 was implemented using Xilinx System Generator tools targeting
FPGA implementation. The design assumes 16-element UCA inputs. The input
world length corresponding to the ADC resolution was assumed to be 8-bits. A
191
HT
HT
RF chain
LORF chain
RF chain
ROACH−2
(-) (-)
(-)
A
D
C
N
-p
o
in
t
F
F
T
β0 γ0
γ15
N
-p
o
in
t
F
F
T
(-)
ANT-15
ANT-0
b
e
a
m
-1
5
β15
b
e
a
m
-0
Figure 9.7: The overview of architecture of the 2.4 GHz digital circular array re-
ceiver.
parallel input 16-point spatial DFT core was also designed following the Radix-2
structure. The twiddle factors of the FFT was chosen to as 8-bits after different
trials to achieve a trade-off between hardware resources and precision. A precision
of 8-bits was also used to represent the diagonal multiplicands γi. The fixed-point
digital core was then simulated by sending in Matlab generated inputs plane waves
to compute the resulting array factors. Fig. 9.6 show the fixed-point core generated
beam patterns with a comparison of the Matlab floating-point simulated beam pat-
terns in the azimuthal plane (θ is fixed at 30◦). Here the γi coefficients were selected
by setting θmax = 30
◦.
9.5 Real-Time Experimental Verification of the Proposed
Algorithm
For the real-time experimental verification of the proposed algorithm, a 2.4 GHz
16-element receiver array was built. The overview architecture of the hardware that
was designed is shown in Fig. 9.7.
192
Receivers
(a)
ROACH−2
(b)
network
LO−distribution 
Dipole UCA
ADCs
Figure 9.8: (a) The receiver array consisting of a 16-element 2.4 GHz dipole array
and the receiver chains. (b) The array front-end and the ROACH-2 FPGA based
digital back-end.
9.5.1 16-Element UCA
A 16-element UCA was built using commercially available dipole antennas. The
specifications of the dipoles can be found here [190]. The antennas resonate in the
range of 2.4 GHz to 2.5 GHz. The element power patterns shown in the data sheet
in [190, p. 6] indicate that the element pattern is omni directional in the azimuthal
plane as expected for a dipole. The radius a of the antenna array was set to 159
mm (the arc separation was 0.51λ where λ corresponds to 2.45 GHz). The antenna
array built is shown in Fig. 9.8.
9.5.2 RF Receivers
The receivers were designed and built using the commercially available off-the-
shelf components. A single mixer based receiver architecture was used as shown
in Fig. 9.7. A centerlized LO based scheme was used to bring down the RF to IF.
Shown in Table 9.2 are the model numbers of the components used for the design
and their key-parameters. Two LNA stages were used in the design for improved
gain performance of the receivers. An RF bandpass filtering stage was used to reject
the out of band signals. The cascaded gain for the receiver was calculated to be 42
193
dB. The cascaded noise figure (NF) was 2.2 dB. It has to be noted that the receiver
chains have not been optimally designed for noise performance and have been de-
signed only to demonstrate the real-time synthesis of the circular multibeams using
the proposed algorithm.
Table 9.2: Components used in the RF receiver chain (all components are from
Mini-Circuits [191])
Component Model number Gain (dB) NF (dB) OIP3 (dBm)
LNA ZX60-P105LN+ 15 2.0 33
RF BPF VBF-2435+ -1.6 ≈ 1.6 NA
Mixer ZX05-83LH-S+ -6 ≈ 6 4
LPF SLP-550+ -0.1 ≈ 0.1 NA
IF amplifier ZFL-1000LN+ 20 3 14
9.5.3 Digital Back-End
The ROACH-2 FPGA platform that was introduced and used for the works in
Chapters 3 and 4 has been used for implementing the digital back-end for this
system. One ADC16x250-8 card [132] (which was also used for the work in Chapters
3 and 4) has been used to sample the 16-IF signals into the FPGA from the array.
The overview architecture of the digital back-end is illustrated in Fig. 9.7. FIR
filter-based Hilbert transform is employed in each sampled channel to perform the
IQ decomposition. Next as shown in Fig. 9.7, the sampled low-IF signals are
subjected to a narrowband gain phase calibration to equalize the gain and phase
mismatches of the receiver chains. The βi ∈ C where i ∈ [0, 15] coefficients are
precomputed by measuring the gain and phase deviation of each channel by sending
a reference signal through receiver chains.
The rest of the digital design contains the multibeam computation circuit. The
circuits have been designed to accept samples having 8 bits precision. The precision
194
Figure 9.9: The outdoor measurement setup for experimental verification of the
proposed circular multibeam algorithm.
of the twiddle factors and the γi coefficients have been set to 8 bits. At each beam
output, a digital power calculation circuit similar to that of Fig. 3.10 was employed
to aid in array factor measurement in the real-time experimental setup.
9.5.4 Measurement Setup
The setup was used to measure the actual digital RF receive-mode beams. The
γi coefficients in the digital design was set to achieve a maximum sensitivity at
θ = 70◦. Then, the setup was taken outside to an open space to measure the
16 receive-mode multibeams generated from the algorithm. A patch-antenna was
used to send an 2.49 GHz carrier wave. The LO frequency of the receivers was set
to 2.47 GHz. The digital circuits were clocked at 200 MHz. Fig. 9.9 shows the
measurement conditions used for the real-time beam measurement experiment. As
shown in Fig. 9.9, the transmitter was placed in the highest sensitivity direction
approximately of the receive-mode beams in the elevation plane. The transmitter
and the receiver separation was set to approximately 4 m. The real-time beam
measurement was conducted by rotating the receiver-array in the entire azimuthal
195
22.5◦
= 360/16
0 dB
−5 dB
Figure 9.10: All 16 measured beams in single polar plot.
plane from φ = 0◦ to φ = 360◦. The received energy (for a fixed time window) at
each beam for each different emulated direction of arrival was recorded for generating
the measured array factors. Fig. 9.10 shows the measured 16 beam patterns in
the polar domain. It can be seen that the beams are approximately spaced at
22.5◦ confirming equi-spaced 16 multibeams in the azimuthal plane. Figs. 9.11
and 9.12 show a beam by beam comparison of the measured beam patterns with
the corresponding numerically simulated theoretical beam patterns. The measured
plots indicate that the maximum deviation (upward) in the highest sidelobe level is
below 1 dB. The maximum deviation of all array factor measurements with respect
to the expected beam patters were below 3 dB.
196
Figure 9.11: Comparison of measured and simulated beam patterns corresponding
to beam 0 through beam 7.
197
Figure 9.12: Comparison of measured and simulated beam patterns corresponding
to beam 8 through beam 15.
198
9.6 Mitigating Mutual Coupling in UCAs for Multibeam
Generation
MC between the antenna elements in an antenna array is unavoidable. The near-
field coupling that takes place between the neighboring elements in the array show
up as unintended distortions in the array captured signal vector [134]. The coupling
between two antenna elements depends on the inter element distance, orientation
and the element patterns of them. The coupling between antenna elements reduces
with the distance between antenna and therefore MC is significant only in the few
nearest neighbors of the array.
Different approaches for modeling MC in ULA can be abundantly found in liter-
ature [134, 192, 193]. According to [192], the effect of the MC in a N -element ULA
is generally expressed as a linear transform which is termed as a coupling matrix
Kc ∈ CN×N . Such transform models the outputs of a particular antenna element
as a linear combination of its output and others. The same concept is applicable to
UCAs as well and if x denotes the ideal antenna sampled output from the ULA, due
to mutual coupling what is actually observed is a different vector x̄ ∈ CN×1 where,
x̄ = Kcx. (9.13)
The coupling matrixKc here in this model will capture the MC between the elements
in the UCA. For uncoupling the MC, and remove that unindented distortions, the
above transformation should be reversed and ideally, the computation ∴ x = K−1c x̄
should be performed. Having known Kc apriory the above computation has a com-
plexity of the order N2.
According to [44,134,194], the MC of a P -element sub-array from an N -element
ULA where {P : P = 2p + 1, p ∈ Z+} can be characterized by the linear trans-
199
form K†c ∈ CP×P by using the measured S-parameters of the P -element sub-array.
Following [134, Sec. II-A], K†c of a P -element sub-array a can be computed using,
K†c = ZA (ZM + ZA)
−1 , (9.14)
where ZA = ZAIN , ZM = (IN − S)−1 (IN + S)Z0, and Z0 = Z0IN . Here, S is
the measured S-parameter matrix and the IP denotes P × P identity matrix. Z0
corresponds to the characteristic impedance of the transmission line connecting the
antenna to LNA and ZA is the LNA input impedance. This K
†
c matrix can then be
used model the coupling matrix Kc of the N -element array forming a P -diagonal
Toeplitz matrix [134]. The same analogy can be applied to a circular array by
obtaining the K†c matrix of P -adjacent elements (sub-array) of a UCA. Once the K
†
c
is calculated using (9.14) the coupling matrix Kc for the UCA can be formulated
by (assuming identical antenna elements) having it’s ith row to be T i−1r where,
r =
[
a b1 b2 . . . bP+1
2
0 . . . 0 bP+1
2
. . . b2 b1
]
1×N
, (9.15)
and T i−1r is a shift operator defining a right handed circular shift of r, i− 1 times.
The diagonal coefficients a ∈ C of Kc correspond to self-coupling and the the off
diagonal coefficients bk ∈ C, k = 1, 2, ..., P+12 correspond to mutual coupling from
the P neighboring elements.
It is noted that Kc for UCA takes the special structure of a circulant matrix
which also takes the form of in (9.16). Equation (9.16) shows the structure of the
200
coupling matrix for P = 2 scenario.
Kc =



















a b1 0 0 . . . 0 b1
b1 a b1 0 . . . 0 0
0 b1 a b1 . . . 0 0
0 0 b1 a . . . 0 0
0 0 0 b1 . . . 0 0
...
...
...
...
. . .
...
b1 0 0 0 . . . b1 a



















N×N
(9.16)
Since Kc is circulant, exploiting the discussion in Section 9.2, Kc can be expressed
as,
Kc = F
−1
N DMCFN , (9.17)
where DMC ∈ CN×N is a diagonal matrix given by DMC = diag{FN(v)}. The
column vector v here contains the first column of Kc. Now therefore, the mutually
uncoupled output vector from the array can be synthesized as,
x = F−1N D
−1
MCFN x̄. (9.18)
Interestingly, it can be seen that multibeams using the algorithm in Section 9.2
generated while removing the distortions of mutually coupling of the array at the
same time by merging the two operations together with out increasing the compu-
tational complexity. The above computation in (9.18) has the same complexity as
of the N -beam algorithm in Section 9.2 and the original computation complexity in
the order of O(N2) can be brought down to order O(N logN).
The circular multibeam beam network y = WNx in Section 9.2 where WN
is given in (9.6) assumed an ideal mutual coupling free output vector x from the
UCA. The equation (9.19) re-expresses the multibeam network using factorization
201
in (9.10).
y = F−1N DFNx. (9.19)
From (9.18), the above expression in 9.19 can be re-written with respect to the
mutually coupled array outputs x̄ as,
y = F−1N DFNF
−1
N D
−1
MCFN x̄. (9.20)
The above expression for the circular multibeam network can be simplified to the
following form,
y = F−1N D̄FN x̄. (9.21)
where FNF
−1
N = IN and D̄ = DD
−1
MC is a diagonal matrix. Therefore, mutual cou-
pling effect in UCA multibeam generation can be eliminated without any hardware
overhead in digital implementations by changing the weights γi appropriately in
Fig. 9.4.
9.7 Conclusion
A novel algorithm that can produce multiple simultaneous beams uniformly spaced
in the angular domain and that are also electronically adjustable in both azimuthal
and elevation planes using UCAs is proposed in this chapter. The proposed approach
achieves N simultaneous beams in O(N logN) hardware complexity using an N -
element array. The proposed approach have primary applications in ceiling mounted
wireless network access points, 5G base-stations, and other location-based tracking
systems that require a 360 degree field of view. A 2.4 GHz 16-element digital UCA
receiver has been built using commercial off-the-shelf components to experimentally
verify the circular multibeams using the proposed method. The measured beam
202
patterns resemble the expected theoretically simulated beam patterns quite well.
The maximum deviation (upward) in the highest sidelobe level of the measured
patterns are below 1 dB. The maximum deviation of all array factor measurements
with respect to the expected beam patters were noted to be below 3 dB.
In addition, a new method is proposed to get rid of the mutual coupling effect on
the receive mode at a hardware complexity of O(N logN). It is further shown how
the mutual coupling uncoupling can be simultaneously achieved in the multibeam
realization without any added complexity.
The beams that are generated through the weight realization approach in Sec-
tion 9.1 gives higher side-lobe levels. This can be alleviated by using appropriate
windowing techniques with the UCAs. Future work can be directed towards combin-
ing side-lobe reduction techniques with the proposed N -beam algorithm. Hardware
verification of the proposed N -beam algorithm and the mutual coupling uncoupling
method can be conducted as future work.
203
CHAPTER 10
CONCLUSIONS AND FUTURE WORK
Large-scale multibeam arrays will be a pressing requirement in future commu-
nication systems at mmW and higher frequencies to overcome the channel impair-
ments. This dissertation is focused on algorithms, circuits and implementation
verification of the proposed approaches towards reduced hardware complexity in
multibeam arrays. Analog, digital, and hybrid approaches for reduced hardware
complexity, power consumption of larger multibeam algorithms have been proposed
for ULA and URA processing. A method to efficiently generate N multiple si-
multaneous beams using an N -element circular array configuration has also been
investigated.
Fully digital beamforming is still not widely adopted due to the high hard-
ware complexity and the associated computational cost. This dissertation proposes
low-complexity N -beam digital beamforming algorithms based on the spatial DFT-
based beamforming approach to generate similar beams with no multiplications
involved in digital circuits. Such an approach leads to low-SWaPC digital VLSI
implementations. Chapter 3 presents the work on low-complexity generation of 8
and 16 digital RF beams using a ULA with an analysis of their performance. The
proposed 8- and 16-beam low-complexity beamforming algorithms that generate in-
dependent simultaneous beams have been digitally implemented and experimentally
verified using a real-time 16-element phased array at 2.4 GHz with the baseband
DSP performed using FPGAs. The beams realized using the proposed algorithms
have been compared against the beams realized using fixed-point exact-DFT digital
implementations. Chapter 4 proposes a 32-point ADFT algorithm for generating
32 simultaneous beams. The algorithm yields a multibeam processor with 46% less
chip area and 55% less dynamic power consumption. The 32-beam algorithm has
204
been verified using a 32-element receiver array setup that was built at 5.8 GHz with
the baseband DSP performed using FPGAs. The digitally measured beams perform
same as the corresponding measured beams that are generated using fixed-point ex-
act FFT circuits. Both the setup at 2.4 GHz and the setup at 5.8 GHz confirm that
the hardware implementation imperfections such as front-end mismatches, phase in-
coherences, and phase noise of the local oscillator, make the system far from the ideal
behavior which is impossible to achieve. Thus, there is sufficient room available to
develop approximate algorithms, that can avoid high precision digital circuits that
are optimized for reducing power and area costs to suit the performance of the RF
front-end. Further, the complexity bounds associated in obtaining multiple simulta-
neous beams with URAs and the use of proposed ADFTs to achieve 2D beamforming
in URAs without using digital multipliers in an ultra low-SWaP manner have been
presented. The 32-beam linear array beam measurements at 5.8 GHz have been
used to synthesize the corresponding 2D beams of a equivalent 2D array made of
similar 32, 32-element subarrays.
The research described in this dissertation demonstrates, for the first time, fully
digital multibeam beamforming across a full 800 MHz of bandwidth using a 28
GHz 4-element receiver array that has been custom built. The digital back-end
uses Xilinx RFSoC to sample the full bandwidth into digital and to perform digital
multibeam beamforming. The real-time measured beam responses have been given
for different frequencies across the full processing baseband bandwidth.
The DFT approximations that were proposed for digital multibeam beamform-
ing have also been proposed to design analog circuits that generate multibeams.
The sparse factorization of the ADFTs that contain only small Gaussian integers
have been used to realize beamforming circuits by employing analog CMOS cur-
rent mirror–based implementations. Instead of realizing the small integer coeffi-
205
cient ADFT matrices directly, the proposed approach realizes N -beam beamform-
ing network in analog using current mirrors in the order of the N . A schematic
implementation of the 16-point ADFT algorithm that generates 16 high bandwidth
analog beams was carried out in Cadence using 65 nm TSMC BSIM4 models. The
simulated beam patterns of the current-mode circuit show that the analog CMOS
schematic can handle high bandwidths up to 1.5 GHz without significant deviation
in the expected beam patterns. The schematic-level analysis shows the worst-case
and average sidelobe levels of -10.17 dB and -12.2 dB at a bandwidth of 1 GHz as
well as -9.08 dB and -11.32 dB at a bandwidth of 1.5 GHz. The proposed multi-
beam architectures have the potential to reduce circuit area and power requirements
while meeting the bandwidth requirements of emerging 5G baseband systems.
A low-complexity DVM algorithm having sparse factors has been proposed for
wideband squint-free beamforming. Arithmetic complexities of the proposed algo-
rithm show that it is much more efficient than the direct computation of DVM-
vector multiplication. The proposed algorithm is used to realize a squint-free wide-
band multi-beam beamforming architecture using mixed-signal CMOS integrated
circuits. The architecture is useful for emerging 5G wireless communication systems
requiring a variable number of sharp steerable beams. The proposed algorithm has
been verified in simulations using CMOS APF circuit responses in the S-band and
Ka-band. The analysis of the SFGs of the proposed algorithm for the 8-beam case
uses only 24 APF blocks, whereas a similar 4-beam network using a direct TTD
phased array would involve 60 such blocks; thus, the proposed approach achieves a
60% reduction in hardware complexity. The 8-beam network only requires 224 APF
blocks, whereas a direct 8-beam network would require 1008 blocks owing to 77%
reduction overall in terms of hardware required.
206
An analog-digital hybrid beamforming architecture is also proposed targeting
mmW implementations via sub-arrays. The analog sub-arrays support from 1 to N
fixed TTD wideband mmW beams, depending on the required capacity, which makes
the architecture flexible to use in both mobile units and base stations. A novel low-
complexity wideband digital beamforming method is proposed for the level-2 digital
beamforming stage. Simulations verify that the similar array factor performance
of a order 22 FIR implementation can be achieved by using a second-order Thiran
filter based beamformer which approximately saves a 63.6% of multipliers (counting
11 multipliers for the 22nd-order FIR system due to the symmetry in the impulse
response). Thus, the overall hybrid beamformer leads to ultra low-complexity im-
plementations supporting wide bandwidth and squint-free operation. The proposed
analog beamformer contains several analog circuit components that are subject to
on-chip PVT variations. The resulting delay and gain shifts in each receiver cause
errors in the spatial orientation of the beams. Calibration methods to compensate
for such shifts include variable-gain amplifiers for gain, tunable APFs for delay, and
reference input signals for training the algorithm. Development of such models,
methods and associated circuitry can be studied as future work.
A new method that uses UCAs to produce multiple simultaneous beams that
are uniformly spaced in the angular domain which are also electronically adjustable
in both azimuthal and elevation planes has been proposed. The proposed approach
can generate N simultaneous beams in O(N logN) hardware complexity using an
N -element arrays compared to O(N2) complexity in direct synthesis approach. The
proposed approach has promising applications in ceiling-mounted wireless network
access points, 5G base-stations and other location-based tracking systems that re-
quire a full 360 degree field of view. A 2.4 GHz 16-element receive-mode digital
UCA setup has been build to experimentally verify the proposed algorithm. Real-
207
time beam measurements are in well agreement with the expected theoretical beam
patterns with only 1 dB of upward deviation in the sidelobe levels. In addition, a
new method has been proposed to eliminate the mutual coupling in receive mode at
a hardware complexity of O(N logN). It is further shown how the mutual coupling
uncoupling can be simultaneously achieved in the multibeam realization without
any added complexity. The beams that are generated through the weight realiza-
tion approach described in Section 9.1 gives higher side-lobe levels. This can be
alleviated by using appropriate windowing techniques with the UCAs. Future work
can be directed towards combining side-lobe reduction techniques with the proposed
N -beam algorithm. Hardware verification of the proposed N -beam algorithm and
the mutual coupling uncoupling method can be conducted as future work.
208
BIBLIOGRAPHY
[1] Microwave Journal, 2017 (accessed on Sept. 2019).
[Online]. Available: https://www.microwavejournal.com/articles/
27830-ibm-and-ericsson-announce-5g-mmwave-phase-array-antenna-module
[2] N. R. A. Observatory, “Very large array,” (accessed on Oct. 2019). [Online].
Available: http://www.vla.nrao.edu/
[3] Xilinx, “Zynq ultrascale+ RFSoC RF data converter 2.0,” Apr. 2018.
[Online]. Available: https://www.xilinx.com/support/documentation/ip
documentation/usp rf data converter/v2 0/pg269-rf-data-converter.pdf
[4] B. Sadhu, Y. Tousi, J. Hallin, S. Sahl, S. Reynolds, O. Renström, K. Sjögren,
O. Haapalahti, N. Mazor, B. Bokinge, G. Weibull, H. Bengtsson, A. Carlinger,
E. Westesson, J.-E. Thillberg, L. Rexberg, M. Yeck, X. Gu, D. Friedman, and
A. Valdes-Garcia, “A 28GHz 32-element phased array transceiver IC with
concurrent dual polarized beams and 1.4 degree beam steering resolution for
5G communication,” in IEEE International Solid-State Circuits Conference,
2017, pp. 128–129.
[5] E. Brookner, “Phased arrays and radars–past, present and future,” Microwave
Journal, Jan. 2006.
[6] J. Wilson and N. Patwari, “Radio tomographic imaging with wireless net-
works,” IEEE Transactions on Mobile Computing, vol. 9, no. 5, pp. 621–632,
May 2010.
[7] Pety, J. and Rodŕıguez-Fernández, N., “Revisiting the theory of
interferometric wide-field synthesis,” A&A, vol. 517, p. A12, 2010. [Online].
Available: https://doi.org/10.1051/0004-6361/200912873
[8] B. D. Van Veen and K. M. Buckley, “Beamforming: a versatile approach to
spatial filtering,” IEEE ASSP Magazine, vol. 5, no. 2, pp. 4–24, April 1988.
[9] A. T. Deller, S. J. Vigeland, D. L. Kaplan, W. M. Goss, W. F. Brisken,
S. Chatterjee, J. M. Cordes, G. H. Janssen, T. J. W. Lazio, L. Petrov,
B. W. Stappers, and A. Lyne, “Microarcsecond VLBI pulsar astrometry with
PSRπ. i. two binary millisecond pulsars with white dwarf companions,” The
Astrophysical Journal, vol. 828, no. 1, p. 8, aug 2016. [Online]. Available:
https://doi.org/10.3847%2F0004-637x%2F828%2F1%2F8
209
[10] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers
of base station antennas,” IEEE Transactions on Wireless Communications,
vol. 9, no. 11, pp. 3590–3600, November 2010.
[11] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive
MIMO for next generation wireless systems,” IEEE Communications Mag-
azine, vol. 52, no. 2, pp. 186–195, February 2014.
[12] T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. N.
Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter wave mobile
communications for 5G cellular: It will work!” IEEE Access, vol. 1, pp. 335–
349, 2013.
[13] M. Shafi, A. F. Molisch, P. J. Smith, T. Haustein, P. Zhu, P. De Silva,
F. Tufvesson, A. Benjebbour, and G. Wunder, “5g: A tutorial overview of
standards, trials, challenges, deployment, and practice,” IEEE Journal on Se-
lected Areas in Communications, vol. 35, no. 6, pp. 1201–1221, June 2017.
[14] S. Sun, T. S. Rappaport, R. W. Heath, A. Nix, and S. Rangan, “MIMO for
millimeter-wave wireless communications: beamforming, spatial multiplexing,
or both?” IEEE Communications Magazine, vol. 52, no. 12, pp. 110–121,
December 2014.
[15] T. S. Rappaport, Y. Xing, O. Kanhere, S. Ju, A. Madanayake, S. Mandal,
A. Alkhateeb, and G. C. Trichopoulos, “Wireless communications and appli-
cations above 100 GHz: Opportunities and challenges for 6G and beyond,”
IEEE Access, vol. 7, pp. 78 729–78 757, 2019.
[16] DARPA, “DARPA seeks to improve military communications with
digital phased-arrays at millimeter wave,” Jan. 2018. [Online]. Available:
https://www.darpa.mil/news-events/2018-01-24
[17] T. Rappaport, R. Heath, R. Daniels, and J. Murdock, Millimeter Wave
Wireless Communications, ser. Prentice Hall Communications Engineering
and Emerging Technologies Series. Prentice Hall, 2015. [Online]. Available:
https://books.google.com/books?id=\ Tt\ BAAAQBAJ
[18] K. Haneda, J. Zhang, L. Tan, G. Liu, Y. Zheng, H. Asplund, J. Li, Y. Wang,
D. Steer, C. Li, T. Balercia, S. Lee, Y. Kim, A. Ghosh, T. Thomas,
T. Nakamura, Y. Kakishima, T. Imai, H. Papadopoulos, T. S. Rappaport,
G. R. MacCartney, M. K. Samimi, S. Sun, O. Koymen, S. Hur, J. Park,
210
C. Zhang, E. Mellios, A. F. Molisch, S. S. Ghassamzadeh, and A. Ghosh, “5G
3GPP-like channel models for outdoor urban microcellular and macrocellular
environments,” 2016 IEEE 83rd Vehicular Technology Conference (VTC
Spring), May 2016. [Online]. Available: http://arxiv.org/abs/1602.07533.
[19] K. Haneda, L. Tian, H. Asplund, J. Li, Y. Wang, D. Steer, C. Li, T. Balercia,
S. Lee, Y. Kim, A. Ghosh, T. Thomas, T. Nakamurai, Y. Kakishima, T. Imai,
H. Papadopoulas, T. S. Rappaport, G. R. MacCartney, M. K. Samimi, S. Sun,
O. Koymen, S. Hur, J. Park, J. Zhang, E. Mellios, A. F. Molisch, S. S. Ghas-
samzadeh, and A. Ghosh, “Indoor 5G 3GPP-like channel models for office
and shopping mall environments,” in 2016 IEEE International Conference on
Communications Workshops (ICC), May 2016, pp. 694–699.
[20] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, and P. Popovski, “Five
disruptive technology directions for 5g,” IEEE Communications Magazine,
vol. 52, no. 2, pp. 74–80, February 2014.
[21] S. Wyne, K. Haneda, S. Ranvier, F. Tufvesson, and A. F. Molisch, “Beamform-
ing effects on measured mm-wave channel characteristics,” IEEE Transactions
on Wireless Communications, vol. 10, no. 11, pp. 3553–3559, November 2011.
[22] R. Mudumbai, D. R. B. Iii, U. Madhow, and H. V. Poor, “Distributed trans-
mit beamforming: challenges and recent progress,” IEEE Communications
Magazine, vol. 47, no. 2, pp. 102–110, February 2009.
[23] G. R. MacCartney Jr, S. Deng, S. Sun, and T. S. Rappaport, “Millimeter-wave
human blockage at 73 GHz with a simple double knife-edge diffraction model
and extension for directional antennas,” arXiv preprint arXiv:1607.00226,
2016.
[24] H. T. Friis, “A note on a simple transmission formula,” Proceedings of the
IRE, vol. 34, no. 5, pp. 254–256, May 1946.
[25] W. Hong, Z. H. Jiang, C. Yu, J. Zhou, P. Chen, Z. Yu, H. Zhang, B. Yang,
X. Pang, M. Jiang, Y. Cheng, M. K. T. Al-Nuaimi, Y. Zhang, J. Chen, and
S. He, “Multibeam antenna technologies for 5G wireless communications,”
IEEE Transactions on Antennas and Propagation, vol. 65, no. 12, pp. 6231–
6249, Dec 2017.
[26] “3GPP the mobile broadband standard,” July 2018 (accessed on October
2018). [Online]. Available: http://www.3gpp.org/release-15
211
[27] E. Dahlman, S. Parkvall, and J. Skold, 5G NR: The Next Generation
Wireless Access Technology. Elsevier Science, 2018. [Online]. Available:
https://books.google.com/books?id=C5poDwAAQBAJ
[28] A. Adhikary, J. Nam, J. Ahn, and G. Caire, “Joint spatial division and multi-
plexingthe large-scale array regime,” IEEE Transactions on Information The-
ory, vol. 59, no. 10, pp. 6441–6463, Oct 2013.
[29] L. Schulwitz and A. Mortazawi, “A compact dual-polarized multibeam
phased-array architecture for millimeter-wave radar,” IEEE Transactions on
Microwave Theory and Techniques, vol. 53, no. 11, pp. 3588–3594, Nov 2005.
[30] J. Yan, H. Liu, B. Jiu, B. Chen, Z. Liu, and Z. Bao, “Simultaneous multibeam
resource allocation scheme for multiple target tracking,” IEEE Transactions
on Signal Processing, vol. 63, no. 12, pp. 3110–3122, June 2015.
[31] A. van Ardenne, J. D. Bregman, W. A. van Cappellen, G. W. Kant, and
J. G. Bij de Vaate, “Extending the field of view with phased array techniques:
Results of european SKA research,” Proceedings of the IEEE, vol. 97, no. 8,
pp. 1531–1542, Aug 2009.
[32] M. Tegmark and M. Zaldarriaga, “Omniscopes: Large area telescope arrays
with only nlogn computational cost,” Phys. Rev. D, vol. 82, p. 103501,
Nov 2010. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevD.
82.103501
[33] W. L. Stutzman and G. A. Thiele, Antenna Theory and Design, ser. Antenna
Theory and Design. Wiley, 2012, ISBN 9780470576649.
[34] T. K. Gunaratne, “Beamforming of temporally broadband bandpass plane
waves using 2D FIR trapezoidal filters,” Ph.D. dissertation, University of Cal-
gary, Sep. 2011.
[35] D. Dudgeon and R. Mersereau, Multidimensional digital signal processing,
ser. Prentice-Hall signal processing series. Prentice-Hall, 1984. [Online].
Available: https://books.google.com/books?id=-4woAQAAMAAJ
[36] T. Li, F. Zhang, F. Zhang, Y. Yao, and L. Jiang, “Wideband and high-gain
uniform circular array with calibration element for smart antenna application,”
IEEE Antennas and Wireless Propagation Letters, vol. 15, pp. 230–233, 2016.
212
[37] L. C. Kretly, A. Cerqueira S, and A. Tavora AS, “A hexagonal adaptive an-
tenna array concept for wireless communication applications,” in The 13th
IEEE International Symposium on Personal, Indoor and Mobile Radio Com-
munications, vol. 1, Sep. 2002, pp. 247–249 vol.1.
[38] L. ahin, E. zyurt, K. Y. Kapusuz, . Demir, Y. en, and U. Ouz, “Non-uniformly
spaced planarphased array antenna for ku-band mobile direct broadcasting
satellite reception systems,” in 2017 8th International Conference on Recent
Advances in Space Technologies (RAST), June 2017, pp. 203–207.
[39] A. R. Taylor, “The Square Kilometre Array,” in A Giant Step: from Milli- to
Micro-arcsecond Astrometry, ser. IAU Symposium, W. J. Jin, I. Platais, and
M. A. C. Perryman, Eds., vol. 248, Jul 2008, pp. 164–169.
[40] W. B. Abbas and M. Zorzi, “Towards an appropriate receiver beamforming
scheme for millimeter wave communication: A power consumption based com-
parison,” in European Wireless 2016; 22th European Wireless Conference, May
2016, pp. 1–6.
[41] F. Sohrabi and W. Yu, “Hybrid digital and analog beamforming design for
large-scale antenna arrays,” IEEE Journal of Selected Topics in Signal Pro-
cessing, vol. 10, no. 3, pp. 501–513, April 2016.
[42] J. Lota, S. Sun, T. S. Rappaport, and A. Demosthenous, “5g uniform linear
arrays with beamforming and spatial multiplexing at 28 GHz, 37 GHz, 64
GHz and 71 GHz for outdoor urban communication: A two-level approach,”
IEEE Transactions on Vehicular Technology, vol. PP, no. 99, pp. 1–1, 2017.
[43] R. W. Heath, N. González-Prelcic, S. Rangan, W. Roh, and A. M. Sayeed, “An
overview of signal processing techniques for millimeter wave MIMO systems,”
IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 3, pp. 436–
453, April 2016.
[44] C. Balanis, Antenna Theory: Analysis and Design. Wiley, 2012. [Online].
Available: https://books.google.com/books?id=v1PSZ48DnuEC
[45] H. Iwakura, “Realization of tapped delay lines using switched-capacitor LDI
ladders and application to FIR filter design,” IEEE Transactions on Circuits
and Systems II: Analog and Digital Signal Processing, vol. 40, no. 12, pp.
794–797, Dec 1993.
213
[46] N. Lin, W. Liu, and R. Langley, “Performance analysis of a broadband beam-
forming structure without tapped delay-lines,” in 2007 15th International
Conference on Digital Signal Processing, July 2007, pp. 583–586.
[47] J. E. Zekry, G. N. Daoud, H. A. Ghali, and H. F. Ragai, “Design and sim-
ulation of digitally tunable high-Q on-chip inductor,” in 2007 Internatonal
Conference on Microelectronics, Dec 2007, pp. 239–242.
[48] O. L. Frost, “An algorithm for linearly constrained adaptive array processing,”
Proceedings of the IEEE, vol. 60, no. 8, pp. 926–935, Aug 1972.
[49] K. Buckley and L. Griffiths, “An adaptive generalized sidelobe canceller with
derivative constraints,” IEEE Transactions on Antennas and Propagation,
vol. 34, no. 3, pp. 311–319, March 1986.
[50] M. E. Lockwood, D. L. Jones, C. R. Lansing, W. D. O’Brien, B. C. Wheeler,
and A. S. Feng, “Effect of multiple nonstationary sources on MVDR beam-
formers,” in The Thrity-Seventh Asilomar Conference on Signals, Systems
Computers, 2003, vol. 1, Nov 2003, pp. 730–734 Vol.1.
[51] Huiping Duan, Boon Poh Ng, and Chong Meng See, “A new broadband beam-
former using iir filters,” IEEE Signal Processing Letters, vol. 12, no. 11, pp.
776–779, Nov 2005.
[52] L. Bruton and N. Bartley, “Highly selective three-dimensional recursive beam
filters using intersecting resonant planes,” IEEE Transactions on Circuits and
Systems, vol. 30, no. 3, pp. 190–193, March 1983.
[53] A. Madanayake, S. V. Hum, and L. T. Bruton, “Uwb beamforming using dig-
ital 2d iir frequency-planar filters,” in 2008 IEEE Antennas and Propagation
Society International Symposium, July 2008, pp. 1–4.
[54] R. A. Horn, Topics in Matrix Analysis. New York, NY, USA: Cambridge
University Press, 1986.
[55] C. Wijenayake, A. Madanayake, and L. T. Bruton, “Systolic-array architecture
for 2d iir wideband dual-beam space-time plane-wave filters,” in 2010 53rd
IEEE International Midwest Symposium on Circuits and Systems, Aug 2010,
pp. 229–232.
214
[56] ——, “Systolic-array architecture for 2d iir wideband dual-beam space-time
plane-wave filters,” in 2010 53rd IEEE International Midwest Symposium on
Circuits and Systems, Aug 2010, pp. 229–232.
[57] H. Moody, “The systematic design of the Butler matrix,” IEEE Transactions
on Antennas and Propagation, vol. 12, no. 6, pp. 786–788, Nov 1964.
[58] F. Huang, W. Chen, and M. Rao, “Switched-beam antenna array based on
Butler matrix for 5G wireless communication,” in 2016 IEEE International
Workshop on Electromagnetics: Applications and Student Innovation Compe-
tition (iWEM), May 2016, pp. 1–3.
[59] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to
Algorithms, 2nd ed., 2001.
[60] W. Rotman and R. Turner, “Wide-angle microwave lens for line source appli-
cations,” IEEE Transactions on Antennas and Propagation, vol. 11, no. 6, pp.
623–632, Nov. 1963.
[61] N. J. G. Fonseca, “Printed S-band 4 × 4 Nolen matrix for multiple beam
antenna applications,” IEEE Transactions on Antennas and Propagation,
vol. 57, no. 6, pp. 1673–1678, Jun. 2009.
[62] S. Sun, T. S. Rappaport, M. Shafi, and H. Tataria, “Analytical framework of
hybrid beamforming in multi-cell millimeter-wave systems,” IEEE Transac-
tions on Wireless Communications, pp. 1–1, 2018.
[63] S. Sun, T. S. Rappaport, M. Shafi, P. Tang, J. Zhang, and P. J. Smith, “Prop-
agation models and performance evaluation for 5g millimeter-wave bands,”
IEEE Transactions on Vehicular Technology, vol. 67, no. 9, pp. 8422–8439,
Sep. 2018.
[64] C. Fulton, M. Yeary, D. Thompson, J. Lake, and A. Mitchell, “Digital phased
arrays: Challenges and opportunities,” Proceedings of the IEEE, vol. 104,
no. 3, pp. 487–503, March 2016.
[65] T. S. Rappaport, Y. Xing, O. Kanhere, S. Ju, A. Madanayake, S. Mandal,
A. Alkhateeb, and G. C. Trichopoulos, “Wireless communications and appli-
cations above 100 GHz: Opportunities and challenges for 6G and beyond,”
IEEE Access (accepted), 2019.
215
[66] D. R. Bull and D. H. Horrocks, “Primitive operator digital filters,” IEE Pro-
ceedings G - Circuits, Devices and Systems, vol. 138, no. 3, pp. 401–412, June
1991.
[67] T. S. Rappaport, Y. Xing, G. R. MacCartney, A. F. Molisch, E. Mellios, and
J. Zhang, “Overview of millimeter wave communications for fifth-generation
(5G) wireless networkswith a focus on propagation models,” IEEE Transac-
tions on Antennas and Propagation, vol. 65, no. 12, pp. 6213–6230, Dec 2017.
[68] D. Choudhury, “5G wireless and millimeter wave technology evolution: An
overview,” in 2015 IEEE MTT-S International Microwave Symposium, May
2015, pp. 1–4.
[69] S. Sun, T. S. Rappaport, and M. Shaf, “Hybrid beamforming for 5G
millimeter-wave multi-cell networks,” IEEE Conference on Computer Com-
munications Workshops (INFOCOM WKSHPS),, pp. 589–596, April 2018.
[70] K. Kibaroglu, M. Sayginer, and G. M. Rebeiz, “An ultra low-cost 32-element
28 GHz phased-array transceiver with 41 dbm EIRP and 1.01.6 Gbps 16-
QAM link at 300 meters,” in 2017 IEEE Radio Frequency Integrated Circuits
Symposium (RFIC), June 2017, pp. 73–76.
[71] M. K. Khattak, C. Lee, D. Han, and S. Kahng, “Flat Rotman lens for 5G
beamforming antenna,” in 2016 IEEE 5th Asia-Pacific Conference on Anten-
nas and Propagation (APCAP), July 2016, pp. 205–206.
[72] K. Kibaroglu, M. Sayginer, and G. M. Rebeiz, “A 28 GHz transceiver chip
for 5G beamforming data links in SiGe BiCMOS,” in 2017 IEEE Bipo-
lar/BiCMOS Circuits and Technology Meeting (BCTM), Oct 2017, pp. 74–77.
[73] R. Chen, H. Xu, C. Li, L. Zhu, and J. Li, “Hybrid beamforming for broad-
band millimeter wave massive MIMO systems,” in 2018 IEEE 87th Vehicular
Technology Conference (VTC Spring), June 2018, pp. 1–5.
[74] M. Sayginer and G. M. Rebeiz, “An eight-element 2-16-GHz programmable
phased array receiver with one, two, or four simultaneous beams in SiGe BiC-
MOS,” IEEE Transactions on Microwave Theory and Techniques, vol. 64,
no. 12, pp. 4585–4597, Dec 2016.
[75] J. Lota, S. Sun, T. S. Rappaport, and A. Demosthenous, “5G uniform linear
arrays with beamforming and spatial multiplexing at 28, 37, 64, and 71 GHz
216
for outdoor urban communication: A two-level approach,” IEEE Transactions
on Vehicular Technology, vol. 66, no. 11, pp. 9972–9985, Nov 2017.
[76] J. Jeong, N. Collins, and M. P. Flynn, “A 260 MHz IF sampling bit-stream
processing digital beamformer with an integrated array of continuous-time
band-pass ∆− Σ modulators,” IEEE Journal of Solid-State Circuits, vol. 51,
no. 5, pp. 1168–1176, May 2016.
[77] S. Jang, J. Jeong, R. Lu, and M. P. Flynn, “A 16-element 4-beam 1 GHz
IF 100 MHz bandwidth interleaved bit stream digital beamformer in 40 nm
CMOS,” IEEE Journal of Solid-State Circuits, vol. 53, no. 5, pp. 1302–1312,
May 2018.
[78] T. Okuyama, S. Suyama, J. Mashino, S. Yoshioka, Y. Okumura, K. Yamazaki,
D. Nose, and Y. Maruta, “Experimental evaluation of digital beamforming
for 5G multi-site massive MIMO,” in 2017 20th International Symposium on
Wireless Personal Multimedia Communications (WPMC), Dec 2017, pp. 476–
480.
[79] P. Xingdong, H. Wei, Y. Tianyang, and L. Linsheng, “Design and implemen-
tation of an active multibeam antenna system with 64 RF channels and 256
antenna elements for massive MIMO application in 5G wireless communica-
tions,” China Communications, vol. 11, no. 11, pp. 16–23, Nov 2014.
[80] R. Miura, T. Tanaka, I. Chiba, A. Horie, and Y. Karasawa, “Beamforming
experiment with a DBF multibeam antenna in a mobile satellite environment,”
IEEE Transactions on Antennas and Propagation, vol. 45, no. 4, pp. 707–714,
Apr 1997.
[81] R. E. Blahut, Fast Algorithms for Digital Signal Processing. Cambridge
University Press, 2010.
[82] Y. A. Atesal, B. Cetinoneri, K. M. Ho, and G. M. Rebeiz, “A two-channel
8-20-GHz SiGe BiCMOS receiver with selectable IFs for multibeam phased-
array digital beamforming applications,” IEEE Transactions on Microwave
Theory and Techniques, vol. 59, no. 3, pp. 716–726, March 2011.
[83] W. M. Gentleman and G. Sande, “Fast fourier transforms: For fun and profit,”
in Proceedings of the November 7-10, 1966, Fall Joint Computer Conference,
ser. AFIPS ’66 (Fall). New York, NY, USA: ACM, 1966, pp. 563–578.
217
[84] S. C. Chan and P. M. Yiu, “An efficient multiplierless approximation of the fast
fourier transform using sum-of-powers-of-two (SOPOT) coefficients,” IEEE
Signal Processing Letters, vol. 9, no. 10, pp. 322–325, Oct 2002.
[85] A. G. Dempster and M. D. Macleod, “Constant integer multiplication using
minimum adders,” IEEE Transactions on Circuits and Systems II: Analog and
Digital Signal Processing, vol. 141, no. 5, pp. 407–413, 1994.
[86] J. R. Cavallaro, M. P. Keleher, R. H. Price, and G. S. Thomas, “Vlsi im-
plementation of a cordic svd processor,” in Proceedings., Eighth Univer-
sity/Government/Industry Microelectronics Symposium, Jun 1989, pp. 256–
260.
[87] Y. Luo, Y. Wang, H. Sun, Y. Zha, Z. Wang, and H. Pan, “Cordic-based archi-
tecture for computing nth root and its implementation,” IEEE Transactions
on Circuits and Systems I: Regular Papers, pp. 1–13, 2018.
[88] Z. Chaozhu and Y. Huizhi, “Design and implementation of discrete fourier
transformation modulated filter banks based on poly-phase filter structure
with near perfect reconstruction,” in 2015 12th IEEE International Conference
on Electronic Measurement Instruments (ICEMI), vol. 02, July 2015, pp. 872–
877.
[89] B. Peng, W. Liu, and D. P. Mandic, “Design of oversampled generalised dis-
crete fourier transform filter banks for application to subbandbased blind
source separation,” IET Signal Processing, vol. 7, no. 9, pp. 843–853, Dec
2013.
[90] U. Vundela, T. S. Kumar, and R. Suresh, “Computation of fractional fourier
transform using filter bank approach and its application,” in International
Conference on Computer Communication and Informatics, Jan 2013, pp. 1–5.
[91] C. Z. Wu and K. L. Teo, “Design of discrete fourier transform modulated filter
bank with sharp transition band,” IET Signal Processing, vol. 5, no. 4, pp.
433–440, July 2011.
[92] V. S. Dimitrov, T. V. Cooklev, and B. Donevsk, “Number theoretic trans-
forms over the golden section quadratic field,” IEEE Transactions on Signal
Processing, vol. 43, no. 8, pp. 1790–1797, Aug. 1995.
218
[93] J. H. Ye and M. D. Shieh, “High-performance ntt architecture for large integer
multiplication,” in 2018 International Symposium on VLSI Design, Automa-
tion and Test (VLSI-DAT), April 2018, pp. 1–4.
[94] X. Feng and S. Li, “Design of an area-effcient million-bit integer multiplier us-
ing double modulus ntt,” IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, vol. 25, no. 9, pp. 2658–2662, Sept 2017.
[95] B. Zhao, B. Liu, C. Wu, W. Yu, J. Su, I. You, and F. Palmieri, “A novel ntt-
based authentication scheme for 10-ghz quantum key distribution systems,”
IEEE Transactions on Industrial Electronics, vol. 63, no. 8, pp. 5101–5108,
Aug 2016.
[96] D. F. G. Coelho, R. J. Cintra, N. Rajapaksha, G. J. Mendis, A. Madanayake,
and V. S. Dimitrov, “DFT computation using gauss-eisenstein basis: FFT al-
gorithms and VLSI architectures,” IEEE Transactions on Computers, vol. 66,
no. 8, pp. 1442–1448, Aug 2017.
[97] A. Madanayake, R. J. Cintra, D. Onen, V. S. Dimitrov, N. T. Rajapaksha,
L. T. Bruton, and A. Edirisuriya, “A row parallel 8 × 8 2D DCT architec-
ture using algebraic integer based exact arithmetic,” IEEE Transactions on
Circuits and Systems for Video Technology, vol. 22, no. 6, pp. 915–929, jun
2012.
[98] Y. Shen and H. Oh, “Pipelined implementation of AI-based Loeffler DCT,”
IEICI Electronics Express, vol. 10, no. 12, pp. 1–7, May 2013.
[99] Q. Yue, C. Ma, and X. Wang, “Canonical signed digit encoding based opti-
mal design for fir filters,” in Proceedings of 2011 International Conference on
Electronic Mechanical Engineering and Information Technology, vol. 2, Aug
2011, pp. 729–732.
[100] B. Elkarami and M. Ahmadi, “An efficient design of 2-d fir digital filters by us-
ing singular value decomposition and genetic algorithm with canonical signed
digit (csd) coefficients,” in 2011 IEEE 54th International Midwest Symposium
on Circuits and Systems (MWSCAS), Aug 2011, pp. 1–4.
[101] D. Misra, S. Dhabal, R. Chakrabarti, and P. Venkateswaran, “Canonical
signed digit representation of quadrature mirror filter using genetic algorithm,”
in 2012 International Conference on Communications, Devices and Intelligent
Systems (CODIS), Dec 2012, pp. 65–68.
219
[102] L. Liang, M. Ahmadi, M. Sid-Ahmed, and K. Wallus, “Design of canonical
signed digit iir filters using genetic algorithm,” in The Thrity-Seventh Asilo-
mar Conference on Signals, Systems Computers, 2003, vol. 2, Nov 2003, pp.
2043–2047 Vol.2.
[103] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, 3rd ed.
Prentice Hall, 2009.
[104] D. Suarez, “Aproximate transforms for discrete fourier applications,” Master’s
thesis, Universidade Federal de Pernambuo, 2015.
[105] D. Suarez, R. J. Cintra, F. M. Bayer, A. Sengupta, S. Kulasekera, and
A. Madanayake, “Multi-beam RF aperture using multiplierless FFT approxi-
mation,” Electronics Letters, vol. 50, no. 24, pp. 1788–1790, 2014.
[106] V. Britanak, P. Yip, and K. R. Rao, Discrete Cosine and Sine Transforms.
Academic Press, 2007.
[107] R. J. Cintra, F. M. Bayer, and C. J. Tablada, “Low-complexity 8-point DCT
approximations based on integer functions,” Signal Processing, vol. 99, pp.
201–214, Jun 2014.
[108] C. J. Tablada, F. M. Bayer, and R. J. Cintra, “A class of DCT approximations
based on the Feig-Winograd algorithm,” Signal Processing, 2015.
[109] R. J. Cintra and F. M. Bayer, “A DCT approximation for image compression,”
IEEE Signal Processing Letters, vol. 18, no. 10, pp. 579–582, Oct. 2011.
[110] D. S. Watkins, Fundamentals of Matrix Computations, ser. Pure and Applied
Mathematics: A Wiley Series of Texts, Monographs and Tracts. Wiley, 2004.
[111] B. N. Flury and W. Gautschi, “An algorithm for simultaneous orthogonal
transformation of several positive definite symmetric matrices to nearly di-
agonal form,” SIAM Journal on Scientific and Statistical Computing, vol. 7,
no. 1, pp. 169–184, Jan. 1986.
[112] D. Suarez, R. J. Cintra, F. M. Bayer, A. Sengupta, S. Kulasekera, and
A. Madanayake, “Multi-beam rf aperture using multiplierless fft approxima-
tion,” Electronics Letters, vol. 50, no. 24, pp. 1788–1790, 2014.
220
[113] G. A. F. Seber, A Matrix Handbook for Statisticians, ser. Wiley Series in
Probability and Statistics. Hoboken, NJ: John Wiley & Sons, Inc., 2008.
[114] T. L. T. da Silveira, F. M. Bayer, R. J. Cintra, S. Kulasekera, A. Madanayake,
and A. J. Kozakevicius, “An orthogonal 16-point approximate DCT for im-
age and video compression,” Multidimensional Systems and Signal Processing,
vol. 27, no. 1, pp. 87–104, Jan 2016.
[115] T. L. T. da Silveira, R. S. Oliveira, F. M. Bayer, R. J. Cintra, and
A. Madanayake, “Multiplierless 16-point DCT approximation for low-
complexity image and video coding,” Signal, Image and Video Processing,
vol. 11, no. 2, pp. 227–233, Feb 2017.
[116] S. K. Pulipati, “Electronically-scanned wideband digital aperture antenna ar-
rays using multi-dimensional space-time circuit-network resonance.” Master’s
thesis, University of Akron, 2017.
[117] CASPER, “Reconfigurable open architecture computing hardware (ROACH-
2) FPGA platform.” Oct. 2013 (accessed February, 2018), Available:
https://casper.berkeley.edu/wiki/ROACH-2Revision2.
[118] “Calibrating ADC16x250-8 ADCs,” University of Berkeley, De-
partment of Astronomy, 2014 (accessed March, 2016), Available:
http://w.astro.berkeley.edu/davidm/gems/.
[119] “Standa 8smcx-usb controllers documentation,” Accessed on May 2017.
[Online]. Available: https://en.xisupport.com/projects/enxisupport/wiki/
8SMC4-USB
[120] “Xilinx xtreme DSP kit 4,” Xilinx Inc., Accessed on May 2017.
[Online]. Available: https://www.xilinx.com/products/boards-and-kits/
do-di-dsp-dk4-uni-g.html
[121] W. Liu and S. Weiss, Wideband Beamforming: Concepts and Techniques, ser.
Wireless Communications and Mobile Computing. Wiley, 2010.
[122] B. Wang, F. Gao, S. Jin, H. Lin, G. Y. Li, S. Sun, and T. S. Rappaport,
“Spatial-wideband effect in massive mimo with application in mmwave sys-
tems,” IEEE Communications Magazine, pp. 1–8, 2018.
[123] M. Ehrgott, Multicriteria Optimization, 2nd ed. Springer, 2005.
221
[124] V. A. Coutinho, V. Ariyarathna, D. F. G. Coelho, R. J. Cintra, and
A. Madanayake, “An 8-beam 2.4 GHz digital array receiver based on a fast
multiplierless spatial DFT approximation,” in Proceedings of International
microwave Symposium, 2018.
[125] V. Ariyarathna, D. F. G. Coelho, S. Pulipati, F. M. Bayer, V. S. Dimitrov,
R. J. Cintra, and A. Madanayake, “Multibeam digital array receiver using a
16-point multiplierless DFT approximation,” IEEE Transactions on Antennas
& Propagation (accepted), 2019.
[126] D. Burton, Elementary Number Theory, 7th ed. McGraw-Hill Education,
2010.
[127] P. Duhamel and H. Hollmann, “Split radix FFT algorithm,” Electronics Let-
ters, vol. 20, no. 1, pp. 14–16, 1984.
[128] S. Winograd, Arithmetic Complexity of Computations. CBMS-NSF Regional
Conference Series in Applied Mathematics, 1980.
[129] “45nm free process design kit,” 2009, Available:
https://www.eda.ncsu.edu/wiki/FreePDK45:Contents.
[130] S. Madishetty, “Design of multi-beam hybrid digital beamforming receivers,”
Master’s thesis, University of Akron, 2018. [Online]. Available: https:
//etd.ohiolink.edu/pg 10?::NO:10:P10 ETD SUBID:175267
[131] R. A. Sainati, CAD of microstrip antennas for wireless applications. Artech
House, Inc., 1996.
[132] CASPER, “ADC16x250-8 coax rev daughter ADC cards,” Available:
https://casper.berkeley.edu/wiki/ADC16x250-8 coax rev 2.
[133] D. M. Pozar, Microwave engineering. John Wiley & Sons, 2009.
[134] J. Kota, A. Madanayake, L. Belostotski, C. Wijenayake, and L. T. Bruton,
“A 2-d signal processing model to predict the effect of mutual coupling on
array factor,” IEEE Antennas and Wireless Propagation Letters, vol. 12, pp.
1264–1267, 2013.
222
[135] IEEE.tv, “Ted tours the 2018 Brooklyn 5G summit expo,” Apr. 2018 (accessed
September 2018), Available: https://ieeetv.ieee.org/ieeetv-specials/ted-tours-
the-brooklyn-5g-summit-expo-floor-3.
[136] “FCC establishes procedures for first 5G spectrum auctions,” Federal Com-
munications Commission: Public Notice (FCC 18-109), Aug. 2018, Available:
https://docs.fcc.gov/public/attachments/DOC-353228A1.pdf.
[137] S. Pulipati, V. Ariyarathna, U. D. Silva, N. Akram, E. Alwan, and
A. Madanayake, “Design of 28 GHz 64-QAM digital receiver,” in The In-
ternational Workshop on Antenna Technology (iWAT). IEEE, 2019, pp.
1252–1255.
[138] S. Pulipati, V. Ariyarathna, U. D. Silva, N. Akram, E. Alwan, and T. S. R.
Arjuna Madanayake, Saumyajith Mandal, “A direct-conversion digital beam-
forming array receiver with 800 MHz channel bandwidth at 28 GHz using Xil-
inx RF SoC,” in International Conference on Microwaves, Communications,
Antennas and Electronic Systems. IEEE, 2019.
[139] Xilinx®, “ZCU1275 characterization board user guide (v1.0),” Nov.
2018. [Online]. Available: https://www.xilinx.com/support/documentation/
boards and kits/zcu1275/ug1285-zcu1275-char-bd.pdf
[140] ——, “Zynq ultrascale RFSoC data sheet DC and AC switching characteris-
tics,” Available: https://tinyurl.com/yc728o6x.
[141] Mini-Circuits, “TCM2-33WX+ RF transformer,” (accessed on Sept. 2019).
[Online]. Available: https://www.minicircuits.com/pdfs/TCM2-33WX+.pdf
[142] H. Krishnaswamy and L. Zhang, “Analog and RF interference mitigation for
integrated MIMO receiver arrays,” Proceedings of the IEEE, vol. 104, no. 3,
pp. 561–575, March 2016.
[143] S. M. McDonnell, V. J. Patel, L. Duncan, B. Dupaix, andW. Khalil, “Compen-
sation and calibration techniques for current-steering DACs,” IEEE Circuits
and Systems Magazine, vol. 17, no. 2, pp. 4–26, May 2017.
[144] R. Sarpeshkar, “Analog versus digital: extrapolating from electronics to neu-
robiology,” Neural Comp., vol. 10, no. 7, pp. 1601–1638, 1998.
[145] U. R. Corporation, “An analog FFT beamformer for acoustic applications,”
for Office of Naval Research, March 1978.
223
[146] M. Lehne and S. Raman, “An analog/mixed-signal FFT processor for wide-
band OFDM systems,” in Sarnoff Symposium, 2006 IEEE, March 2006, pp.
1–4.
[147] ——, “A prototype analog/mixed-signal fast Fourier transform processor IC
for OFDM receivers,” in Radio and Wireless Symposium, 2008 IEEE, Jan
2008, pp. 803–806.
[148] E. Afshari, H. S. Bhat, and A. Hajimiri, “Ultrafast analog fourier transform
using 2-D LC lattice,” IEEE Transactions on Circuits and Systems I, vol. 55,
no. 8, pp. 2332–2343, 2008.
[149] N. Sadeghi, V. C. Gaudet, and C. Schlegel, “Analog DFT processors for
OFDM receivers: Circuit mismatch and system performance analysis,” IEEE
Transactions on Circuits and Systems I, vol. 56, no. 9, pp. 2123–2131, Sept
2009.
[150] A. Farahmand and M. R. Zahabi, “An energy efficient, high speed analog FFT
processor for MB-OFDM UWB receivers,” in Intl. Congress on Technology,
Comm. and Knowledge (ICTCK), Nov 2014, pp. 1–6.
[151] V. Ariyarathna, S. Kulasekera, A. Madanayake, L. Belostotski, K. S. Lee,
R. Cintra, D. Suarez, and F. Bayer, “Multi-beam 4 GHz microwave apertures
using current-mode DFT approximation on 65 nm CMOS,” in IEEE MTTS
International Microwave Symposium (IMS), 2015.
[152] “ADC performance survey 1997-2017 (ISSCC & VLSI symposium),”
https://web.stanford.edu/murmann/adcsurvey.html.
[153] K. D. Choo, J. Bell, and M. P. Flynn, “Area-efficient 1GS/s 6b SAR ADC
with charge-injection-cell-based DAC,” in IEEE International Solid-State Cir-
cuits Conference, ISSCC, vol. 59. United States: Institute of Electrical and
Electronics Engineers Inc., 2 2016, pp. 460–461.
[154] S. Rangan, T. S. Rappaport, and E. Erkip, “Millimeter-wave cellular wireless
networks: Potentials and challenges,” Proceedings of the IEEE, vol. 102, no. 3,
pp. 366–385, March 2014.
[155] F. Casini, R. V. Gatti, L. Marcaccioli, and R. Sorrentino, “A novel design
method for Blass matrix beam-forming networks,” in European Radar Con-
ference, Oct 2007, pp. 232–235.
224
[156] D. Nubler, “Design of a 32 element Rotman lens at 220 GHz with 20 GHz
bandwidth,” in 2015 German Microwave Conference, March 2015, pp. 280–
283.
[157] W. R. Li, C. Y. Chu, K. H. Lin, and S. F. Chang, “Switched-beam antenna
based on modified Butler matrix with low sidelobe level,” Electronics Letters,
vol. 40, no. 5, pp. 290–292, March 2004.
[158] B. Horwath and R. Abhari, “A 60 GHz 2x2 planar phased array with SIW
modified Butler matrix feed,” in IEEE International Symposium on Antennas
and Propagation (APSURSI), June 2016, pp. 1147–1148.
[159] H. Oruç and G. M. Phillips, “Explicit factorization of the Vandermonde ma-
trix,” Linear Algebra Appl., vol. 315, pp. 113–123, 2000.
[160] H. Oruç and H. K. Akmaz, “Symmetric functions and the Vandermonde ma-
trix,” J. Comput. Appl. Math., vol. 172, no. 1, pp. 49–64, Nov. 2004.
[161] J. F. Canny, E. Kaltofen, and L. Yagati, “Solving systems of nonlinear polyno-
mial equations faster,” in Proceedings of the ACM-SIGSAM 1989 International
Symposium on Symbolic and Algebraic Computation, ser. ISSAC ’89. New
York, NY, USA: ACM, 1989, pp. 121–128.
[162] I. Gohberg and V. Olshevsky, “Complexity of multiplication with vectors for
structured matrices,” Linear Algebra Appl., vol. 202, pp. 163–192, 1994.
[163] S. M. Perera, V. Ariyarathna, N. Udayanga, A. Madanayake, G. Wu, L. Be-
lostotski, Y. Wang, S. Mandal, R. J. Cintra, and T. S. Rappaport, “Wideband
n-beam arrays using low-complexity algorithms and mixed-signal integrated
circuits,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 2,
pp. 368–382, May 2018.
[164] V. Y. Pan, “How bad are Vandermonde matrices?” SIAM Journal on Matrix
Analysis and Applications, vol. 37, no. 2, pp. 676–694, 2016.
[165] E. E. Tyrtyshnikov, “How bad are Hankel matrices?” Numerische Mathe-
matik, vol. 67, no. 2, pp. 261–269, Mar 1994.
[166] W. Gautschi and G. Inglese, “Lower bounds for the condition number of Van-
dermonde matrices,” Numerische Mathematik, vol. 52, no. 3, pp. 241–250,
May 1987.
225
[167] C. Wijenayake, A. Madanayake, Y. Xu, L. Belostotski, and L. T. Bruton,
“A steerable DC-1 GHz all-pass filter-sum RF space-time 2-D beam filter in
65 nm CMOS,” in IEEE International Symposium on Circuits and Systems
(ISCAS2013), May 2013, pp. 1276–1279.
[168] P. Ahmadi, B. Maundy, A. S. Elwakil, L. Belostotski, and A. Madanayake,
“A new second-order all-pass filter in 130-nm CMOS,” IEEE Transactions on
Circuits and Systems II: Express Briefs, vol. 63, no. 3, pp. 249–253, March
2016.
[169] Q. Meng and R. Harjani, “An easily extendable FFT based four-channel, four-
beam receiver with progressive partial spatial filtering in 65nm,” in ESSCIRC
Conference 2016: 42nd European Solid-State Circuits Conference, Sept 2016,
pp. 359–362.
[170] “Multiplication algorithm,” Accessed on Sept. 2019, Available:
https://en.wikipedia.org/wiki/Multiplication algorithm . [Online]. Available:
https://en.wikipedia.org/wiki/Multiplication algorithm
[171] P. Ahmadi, M. H. Taghavi, L. Belostotski, and A. Madanayake, “A 0.13- um
CMOS current-mode all-pass filter for multi-GHz operation,” IEEE Transac-
tions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 12, pp.
2813–2818, Dec 2015.
[172] S. W. Ellingson and W. Cazemier, “Efficient multibeam synthesis with inter-
ference nulling for large arrays,” IEEE Transactions on Antennas and Propa-
gation, vol. 51, no. 3, pp. 503–511, 2003.
[173] C. Fulton, M. Yeary, D. Thompson, J. Lake, and A. Mitchell, “Digital phased
arrays: Challenges and opportunities,” Proceedings of the IEEE, vol. 104,
no. 3, pp. 487–503, March 2016.
[174] S. E. Bankov, I. Y. Bredihin, A. G. Davydov, and M. D. Safin, “Digital beam-
forming super wideband multi-beam hybrid antenna system,” in Microwave
Telecommunication Technology (CriMiCo), 2014 24th International Crimean
Conference, Sept 2014, pp. 449–450.
[175] S. Han, C. l. I, Z. Xu, and C. Rowell, “Large-scale antenna systems with
hybrid analog and digital beamforming for millimeter wave 5G,” IEEE Com-
munications Magazine, vol. 53, no. 1, pp. 186–194, January 2015.
226
[176] A. Fahim, “mmWave LO distribution insights,” in 2017 IEEE Custom Inte-
grated Circuits Conference (CICC), April 2017, pp. 1–72.
[177] C. Marcu, “LO generation and distribution for 60 GHz phased array
transceivers,” Ph.D. dissertation, University of California, Berkeley, 2011.
[178] M. Yeary, “Workshop A: Impact-integrated multi-use phased-array common
tile,” Military Radar Summit, Washington DC, February 2016.
[179] A. Madanayake, N. Udayanga, and V. Ariyarathna, “Wideband delay-sum
digital aperture using thiran all-pass fractional delay filters,” in 2016 IEEE
Radar Conference (RadarConf), May 2016, pp. 1–5.
[180] T. Laakso, V. Valimaki, M. Karjalainen, and U. Laine, “Splitting the unit
delay [FIR/all pass filters design],” IEEE Signal Processing Magazine, vol. 13,
no. 1, pp. 30–60, Jan 1996.
[181] F. Harris, “On the use of windows for harmonic analysis with the discrete
fourier transform,” Proceedings of the IEEE, vol. 66, no. 1, pp. 51–83, Jan
1978.
[182] L. Karam and J. McClellan, “Complex Chebyshev approximation for FIR filter
design,” IEEE Transactions on Circuits and Systems II: Analog and Digital
Signal Processing, vol. 42, no. 3, pp. 207–216, Mar 1995.
[183] P. Kootsookos and R. Williamson, “FIR approximation of fractional sample
delay systems,” IEEE Transactions on Circuits and Systems II: Analog and
Digital Signal Processing, vol. 43, no. 3, pp. 269–271, Mar 1996.
[184] J.-P. Thiran, “Recursive digital filters with maximally flat group delay,” IEEE
Transactions on Circuit Theory, vol. 18, no. 6, pp. 659–664, Nov 1971.
[185] W. Zhou, N. H. Noordin, N. Haridas, A. O. El-Rayis, A. T. Erdogan, and
T. Arslan, “A WiFi/4G compact feeding network for an 8-element circular
antenna array,” in 2011 Loughborough Antennas Propagation Conference, Nov
2011, pp. 1–4.
[186] B. Sen, G. Cansiz, and H. Boran, “L band multi-channel transmit/receive
module for circular phased array radar,” in IEEE Radar Conference (Radar-
Con), May 2015, pp. 0001–0004.
227
[187] S. Wang, Y. Cao, H. Su, and Y. Wang, “Coherent sources estimation for mimo
radar with uniform circular array,” in IET International Radar Conference
2015, Oct 2015, pp. 1–5.
[188] P. Ioannides and C. A. Balanis, “Uniform circular and rectangular arrays for
adaptive beamforming applications,” IEEE Antennas and Wireless Propaga-
tion Letters, vol. 4, pp. 351–354, 2005.
[189] R. Gray, Toeplitz and Circulant Matrices: A Review, ser. Foundations
and Trends in Technology. Now Publishers, 2006. [Online]. Available:
https://books.google.com/books?id=PrOi92L5dAUC
[190] “2.4 ghz dipole antenna data sheet,” Linx Technologies, accessed Oct. 2019.
[Online]. Available: https://linxtechnologies.com/wp/wp-content/uploads/
ant-2.4-lcw-ccc-data-sheet.pdf
[191] “Mini-circuits.” [Online]. Available: https://www.minicircuits.com/
[192] H. Steyskal and J. S. Herd, “Mutual coupling compensation in small array
antennas,” IEEE Transactions on Antennas and Propagation, vol. 38, no. 12,
pp. 1971–1975, Dec 1990.
[193] E. M. Friel and K. M. Pasala, “Effects of mutual coupling on the performance
of stap antenna arrays,” IEEE Transactions on Aerospace and Electronic Sys-
tems, vol. 36, no. 2, pp. 518–527, April 2000.
[194] K. F. Warnick and M. A. Jensen, “Optimal noise matching for mutually cou-
pled arrays,” IEEE Transactions on Antennas and Propagation, vol. 55, no. 6,
pp. 1726–1731, June 2007.
228
VITA
PABODA VIDUNETH A. BERUWAWELA PATHIRANAGE
April 30, 1988 Born, Matara, Sri Lanka
2013 B.Sc., Electronics and Telecommunication Eng.
University of Moratuwa
Moratuwa, Sri Lanka
2016 M.Sc., Electrical Engineering
University of Akron
Akron, Ohio
PUBLICATIONS AND PRESENTATIONS
[J6] Arjuna Madanayake, Viduneth Ariyarathna et al., “Towards a Low-SWaP
1024-Beam Digital Array: A 32-Beam Sub-System at 5.8 GHz,” IEEE Trans-
actions on Antennas and Propagation (early access 2019).
[J5] Sirani M. Perera, Viduneth Ariyarathna et al., “Wideband N-Beam Arrays us-
ing Low-Complexity Algorithms and Mixed-Signal Integrated Circuits,” IEEE
Journal Selected Topics in Signal Processing, vol. 12, no. 2, pp. 368-382, 2018.
[J3] Viduneth Ariyarathna et al., “Multibeam Digital Array Receiver Using a 16-
Point Multiplierless DFT Approximation,” IEEE Transactions on Antennas
and Propagation, vol. 67, no. 2, pp. 925-933, 2018.
[J2] Sravan Pulipati, Viduneth Ariyarathna et al., “A 16-Element 2.4-GHz Multi-
Beam Array Receiver using 2-D Spatially-Bandpass Digital Filters,” IEEE
Transactions on Aerospace and Electronic Systems, 2019.
[J1] V. Ariyarathna et al., “Mixed microwave-digital and multi-rate approach for
wideband beamforming applications using 2-D IIR beam filters and nested
uniform linear arrays,” Multidimensional Systems and Signal Processing, vol.
29, no. 2, pp 703-718, 2016.
[C13] Sravan Pulipati, Viduneth Ariyarathna et al., “Real-Time FPGA-Based Multi-
Beam Directional Sensing of 2.4 GHz ISM RF Sources,” in IEEE Moratuwa
Engineering Research Conference (MERCon), 2019 (accepted).
229
[J4] Viduneth Ariyarathna et al., “Analog Approximate-FFT 8/16-Beam Al-
 gorithms, Architectures and CMOS Circuits for 5G Beamforming MIMO
 Transceivers,” IEEE Journal on Emerging and Selected Topics in Circuits
 and Systems, vol. 8, no. 3, pp. 466-479, 2018.
[C12] B. Gu, J. Liang, Y. Wang, D. Ariando, V. Ariyarathna, A. Madanayake, and
S. Mandal, “32-Element Array Receiver for 2-D Spatio-Temporal ∆−Σ Noise-
Shaping,” in IEEE National Aerospace & Electronics Conference (NAECON),
2019 (accepted).
[C11] Sravan Pulipati, Viduneth Ariyarathna et al.,” in IEEE International Work-
shop on Antenna Technology (iWAT), 2019.
[C10] Viduneth Ariyarathna et al., “Real-Time 2-D FIR Trapezoidal Digital Filters
for 2.4 GHz Aperture Receiver Applications,” in IEEE Moratuwa Engineering
Research Conference (MERCon), 2018.
[C9] Haixiang Zhao, Soumyajit Mandal, Viduneth Ariyarathna, Arjuna
Madanayake, and Renato J. Cintra, “An Offset-Canceling Approximate-DFT
Beamforming Architecture for Wireless Transceivers,” in IEEE International
Symposium on Circuits and Systems, 2018.
[C8] Sravan Pulipati, Viduneth Ariyarathna, and Arjuna Madanayake, “A 16-
Element 2.4-GHz Digital Array Receiver using 2-D IIR Spatially-Bandpass
Plane-Wave Filter,” in IEEE MTT-S International Microwave Symposium,
2018.
[C7] Vitor Coutinho, Viduneth Ariyarathna et al., “An 8-Beam 2.4 GHz Digital
Array Receiver Based on a Fast Multiplierless Spatial DFT Approximation,”
in IEEE MTT-S International Microwave Symposium, 2018.
[C6] Arjuna Madanayake, Viduneth Ariyrathna, and Sravan Pulipati, “Design and
Prototype Implementation of an 8-beam 2.4 GHz Array Receiver for Digital
Beamforming,” in IEEE National Aerospace & Electronics Conference (NAE-
CON), 2017.
[C5] Viduneth Ariyarathna et al., “Design methodology of an analog 9-beam
squint-free wideband IF multi-beamformer for mmW applications,” in IEEE
Moratuwa Engineering Research Conference (MERCon), 2017.
[C4] Vassil S. Dimitrov, Viduneth Ariyarathna et al., “A Parallel Method for the
Computation of Matrix Exponential based on Truncated Neumann Series,” in
IEEE Symposium on Computer Arithmetic (ARITH-24), 2017.
230
[C3] Arjuna Madanayake, Viduneth Ariyarathna et al., “Design of a low- 
complexity wideband analog true-time-delay 5-beam array in 65nm CMOS,
in IEEE Midwest Symposium on Circuits and Systems (MWSCAS), 2017.
[C2] A. Madanayake, N. Udayanga, and V. Ariyarathna, “Wideband delay-sum
 digital aperture using Thiran all-pass fractional delay filters,” in IEEE Radar
 Conference (RadarConf), May 2016.
[C1] Viduneth Ariyarathna et al., “Analog 65/130 nm CMOS 5 GHz sub-arrays
 with ROACH-2 FPGA beamformers for hybrid aperture-array receivers,” in
 GOMACTech, 2016.
