True-Time-Delay Arrays for Fast Beam Training in Wideband
  Millimeter-Wave Systems by Boljanovic, Veljko et al.
1True-Time-Delay Arrays for Fast Beam Training
in Wideband Millimeter-Wave Systems
Veljko Boljanovic, Student Member, IEEE, Han Yan, Student Member, IEEE,
Chung-Ching Lin, Student Member, IEEE, Soumen Mohapatra, Student Member, IEEE,
Deukhyoun Heo, Senior Member, IEEE, Subhanshu Gupta, Senior Member, IEEE,
and Danijela Cabric, Senior Member, IEEE
Abstract—The best beam steering directions are estimated
through beam training, which is one of the most important
and challenging tasks in millimeter-wave and sub-terahertz
communications. Novel array architectures and signal processing
techniques are required to avoid prohibitive beam training over-
head associated with large antenna arrays and narrow beams.
In this work, we leverage recent developments in true-time-delay
(TTD) arrays with large delay-bandwidth products to accelerate
beam training using frequency-dependent probing beams. We
propose and study two TTD architecture candidates, including
analog and hybrid analog-digital arrays, that can facilitate beam
training with only one wideband pilot. We also propose a suitable
algorithm that requires a single pilot to achieve high-accuracy
estimation of angle of arrival. The proposed array architectures
are compared in terms of beam training requirements and
performance, robustness to practical hardware impairments,
and power consumption. The findings suggest that the analog
and hybrid TTD arrays achieve a sub-degree beam alignment
precision with 66% and 25% lower power consumption than
a fully digital array, respectively. Our results yield important
design trade-offs among the basic system parameters, power
consumption, and accuracy of angle of arrival estimation in fast
TTD beam training.
Index Terms—True-time-delay array, array architecture, beam
training, millimeter-wave communication, wideband systems
I. INTRODUCTION
ABUNDANT spectrum at millimeter-wave (mmW) fre-quencies is seen as the key resource for providing high
data rates in the fifth generation of cellular systems [1].
However, the use of mmW communication bands comes at
the cost of less favorable propagation conditions [2]. Both
the base station (BS) and user equipment (UE) are required
to use large antenna arrays to achieve high beamforming
(BF) gain and compensate for severe propagation loss. Beam
pointing directions are estimated through beam training, a
procedure that identifies the angle of arrival (AoA) and angle
This work was supported in part by NSF under grants 1718742, 1705026,
and 1944688. This work was also supported in part by the ComSenTer and
CONIX Research Centers, two of six centers in JUMP, a Semiconductor
Research Corporation (SRC) program sponsored by DARPA.
Veljko Boljanovic, Han Yan, and Danijela Cabric are with the Depart-
ment of Electrical and Computer Engineering, University of California,
Los Angeles, Los Angeles, CA 90095 USA (e-mail: vboljanovic@ucla.edu,
yhaddint@ucla.edu, danijela@ee.ucla.edu).
Chung-Ching Lin, Soumen Mohapatra, Deukhyoun Heo, and Subhanshu
Gupta are with the School of Electrical Engineering and Computer
Science, Washington State University, Pullman, WA 99164 USA (e-mail:
chung-ching.lin@wsu.edu, soumen.mohapatra@wsu.edu, dheo@wsu.edu,
sgupta@eecs.wsu.edu).
of departure (AoD) of the dominant propagation path in the
wireless channel. Apart from aligning the beams for data
communication, knowledge of the AoA and AoD is of utmost
importance for other applications in practical mmW systems,
including interference nulling and localization [3].
The existing mmW systems utilize analog array architecture
with a single transceiver radio frequency (RF)-chain at both
the BS and UE due to its power efficiency. Such arrays
are refereed to as phased arrays since they use adjustable
phase shifters to allow coherent signal steering/combining
in a desired direction. The existing beam training schemes
with phased arrays include various types of extensive beam
sweeping, where beams with different pointing directions are
synthesised to probe the channel sequentially in order to find
the AoD and AoA [4]–[7]. The required number of probing
beams linearly scales with the number of antenna elements in
the array, which directly translates into beam training overhead
and latency. Hence, conventionally used beam sweeping faces
scalability challenge in higher mmW frequency bands, where
more antenna elements will be used to achieve the required
BF gain.
Previous work that addresses the beam training problem
can be divided into two categories. The first category intends
to reduce the required number of probing beams. Specifi-
cally, the number scales logarithmically with the array size
when advanced signal processing techniques that exploit the
sparsity of mmW channel are used [8]–[10]. Further, various
side-information, e.g., location information and out-of-band
measurements [11], can also be used to reduce the required
number of probing beams. The second category aims to
enhance the simultaneous channel probing capability by using
advanced hardware design [12]–[19]. These approaches are
more robust when the channel sparsity and side information
are not available. Fully digital array architectures, with a
dedicated RF-chain per each antenna element, offer the highest
flexibility and capability of channel probing. From the signal
processing perspective, signals from all antenna branches can
be steered/combined to simultaneously probe all angular direc-
tions for fast AoD/AoA estimation [12]–[14]. Fully-connected
or sub-band based hybrid arrays are another way to enhance
simultaneous probing of the channel [15]–[17]. They can probe
multiple directions simultaneously and the flexibility increases
linearly with the number of RF-chains that control phase
shifter based analog front-end [12]. The probing capability of
hybrid arrays can be further enhanced by associating probing
ar
X
iv
:2
00
7.
08
71
3v
1 
 [e
es
s.S
P]
  1
7 J
ul 
20
20
2beams with different frequencies using spatio-spectral BF [15].
Leaky wave antenna (LWA) can scan all angular directions
simultaneously by using different frequency resources since
the pointing directions of the beams are frequency-dependent
[18], [19]. However, the existing LWA technique requires
access to THz spectrum for adequate frequency dispersive
beam steering.
TTD arrays are another appealing, yet insufficiently in-
vestigated alternative for fast mmW beam training. Due to
time delaying of the signal in each antenna branch, TTD
arrays have frequency-dependent probing beams, which can be
exploited to enhance the channel probing capability. Further,
the frequency-dependent beams can be fully controlled by
adjusting the delay introduced in TTD circuits [20]. Early
implementations relied on delay lines in all antenna branches
[21], but this approach suffered from low scalability in terms
of required area and power efficiency when the array size be-
comes large. Further, limited delay range at RF is insufficient
to achieve frequency dispersive beam training as proposed in
this work. Recent advancement in TTD arrays with baseband
delay elements and large delay range-to-resolution ratios [22],
[23], improved the scalability and thus enabled the realization
of fast beam training schemes with large arrays.
In this paper, we extend our previous work [24] and present
the design of baseband TTD array architectures for mmW
beam training. To the best of our knowledge, this is the
first work that comprehensively study the system and circuit
aspects of TTD based mmW beam training with dispersive
channel probing. The key contributions are summarized as
follows:
• We propose two TTD architecture candidates with base-
band delay elements for fast beam training: 1) analog
TTD architecture, where signal delaying is done in analog
baseband domain; 2) hybrid analog-digital TTD architec-
ture, where signal delaying is done both in analog and
digital domains.
• We propose a power measurement based beam training
scheme for TTD arrays that requires only one training
pilot. In particular, we design frequency-dependent prob-
ing beams robust to frequency-selective mmW channels
and a digital signal processing (DSP) algorithm for high-
accuracy angle estimation. We numerically evaluate the
performance of the proposed algorithm in a practical
multipath fading channel.
• We study the required TTD hardware specifications of
both array architectures and we explain how hardware
constraints affect the beam training performance. We also
quantitatively study the beam training performance under
practical hardware impairments of TTD array circuits,
including phase errors, delay compensation errors, and
analog-to-digital converter (ADC) quantization errors. We
include the performance of a fully digital array as the
benchmark.
• We model and estimate the power consumption of the
proposed TTD array architectures in the beam train-
ing framework. We investigate how power consumption
scales with the key system parameters, including the
bandwidth and array size, which provides an insight
ADC
  
 
  
 
DSP
 
AoA est.
in eq. (13)
     
     
Signal
ML estimate
of powers
  
  
DFT
    
Analog combiner:
     
     
     
    
     
     
    
    
    
    
    
    
Fixed analog phases and delays
- CP
Fig. 1. Architecture of analog TTD array with uniform delay spacing ∆τ
and phase spacing ∆φ between antennas. The design of combiners and DSP
algorithm is explained in Section III-B.
into the beam training design in future mmW/sub-THz
systems. Power consumption of the fully digital array is
included as the benchmark.
The rest of the paper is organized as follows. In Sec-
tion II, we introduce the two TTD architectures and benchmark
fully digital array. Section III introduces a wideband system
model and it describes the beam training codebook and DSP
algorithm design. In Section IV, we explain the baseband
implementation of TTD elements and compare the considered
architectures in terms of the beam training performance under
practical hardware impairments. Power consumption of all
three considered architectures is modeled and evaluated in
Section V. Section VI concludes the paper.
II. TTD ARRAY ARCHITECTURES FOR BEAM TRAINING
The realization and performance of TTD beam training
schemes heavily depends on the underlying TTD hardware.
The design of a fast high performance beam training scheme
imposes a challenging delay range requirement on TTD cir-
cuits, which raises the question of a beam-training-efficient
TTD array architecture. In this work, the efficiency depends
the number of pilots used in beam training, angle estimation
accuracy, and array power consumption. To address this ques-
tion, we propose and extensively compare two uniform linear
array architectures with baseband TTD elements, including
analog and hybrid analog-digital arrays. We include a fully
digital array architecture in the comparison as the benchmark
due to its known flexibility and high beam training perfor-
mance. All three considered array architectures are described
in the reminder of this section.
An analog uniform linear TTD array with a single RF-
chain and NR antennas is presented in Fig. 1. The n-th
antenna branch has an analog phase shifter with the phase
tap φA,n = (n − 1)∆φ and a analog baseband TTD element
with the delay tap τA,n = (n − 1)∆τ , where ∆φ and ∆τ
represent the phase and delay spacing between neighbor-
ing branches, respectively. Due to the hardware errors in
practical phase shifters and TTD elements, the phase and
delay taps can be distorted. In all antenna branches, we
model the time-invariant distorted taps as independent zero-
mean Gaussian random variables φ˜A,n ∼ N
(
φA,n, σ
2
P
)
and
τ˜A,n ∼ N
(
τA,n, σ
2
T
)
, respectively. For a specific delay spacing
∆τ , TTD frequency-dependent antenna weight vector (AWV)
results in a fixed beam training codebook, where different
3 
  
  
 
  
DSP
 
AoA est.
in eq. (13)     
Signal
ML estimate
of powers
     - CP
Digital combiner:      
  
    
ADC
DFT
  
    
  
    
Analog combiner:      
    
    
    
    
    
    
    
    
     
    
     
    
     
ADC
Fixed analog phases and delays
Fixed digital delays  
  
  
  
        
          
        
Fig. 2. Architecture of hybrid analog-digital TTD array with uniform delay spacing ∆τ and phase spacing ∆φ between antennas. The design of combiners
and DSP algorithm is explained in Section III-B.
 
  
 
Fixed digital delays Digital phases
Digital combiner:      
    
ADC
ADC
ADC
DSP
 
AoA est.
in eq. (13)     
Signal
ML estimate
of powers
     - CP DFT
       
     
    
  
  
  
        
          
        
        
Fig. 3. Architecture of the benchmark fully digital array that emulates a TTD architecture by introducing digital delays. The design of combiners and DSP
algorithm is explained in Section III-B.
frequency components of the signal are hard-coded in different
angular directions. The frequency-flat phase shifters increase
the flexibility by enabling codebook rotations and different
frequency-to-angle mapping. The maximum delay in the NR-
th antenna branch is τA,NR = (NR − 1)∆τ , which becomes
an implementation bottleneck for large antenna arrays. The
state-of-the-art TTD delay range is in the order of 15 ns [22],
which can be insufficient for wideband beam training with a
moderate number of antenna elements NR, e.g., NR = 32, as
we previously discussed in [24].
To alleviate the delay range requirement and improve the
scalability of analog TTD arrays, we introduce a hybrid
analog-digital architecture with NH sub-arrays, each controlled
by one distinct RF-chain, as illustrated in Fig. 2. The hybrid
array uses a combination of analog and digital signal delaying,
where first all the sub-arrays of Nr antennas introduce the
same delays τA,n′ = (n′−1)∆τ, n′ = 1, ..., Nr, in the analog
domain. The relative delay difference among antennas is com-
pensated in the digital domain by introducing the fixed digital
taps τD,h = (h − 1)Nr∆τ, h = 1, ..., NH, i.e., digital delays
fsτD,h, where fs is the sampling frequency. As in the analog
TTD array, the distorted phase taps φ˜A,n, n = 1, ..., NR, and
delay taps τ˜A,n, n = 1, ..., NR, are modeled as independent
Gaussian random variables.
A fully digital array, used as the benchmark, is illustrated
in Fig. 3. The digital array can emulate a TTD array through
DSP by using the fixed digital taps τD,n = (n − 1)∆τ, n =
1, ..., NR, i.e., digital delays fsτD,n in the corresponding
antenna branches. The ability to control the digital phases
φD,n, n = 1, ..., NR, in DSP, allows the signal frequency
components to be independently steered/combined in any
angular direction, which provides high flexibility in the beam
training design. The digital array does not have the phase
shifters and TTD elements before the ADCs, thus we assume
it is insensitive to hardware errors. However, each antenna
element has a dedicated RF-chain, which significantly affects
the array power efficiency, as discussed later in Section V.
In the next section, we explain how ∆τ and ∆φ are set up in
all three architecture to obtain a beam training codebook robust
to frequency-selective channels. We also introduce a DSP
algorithm that exploits this codebook. Based on the designed
∆τ , Section IV discusses the requirements in TTD hardware
implementation and impact of hardware impairments on the
beam training performance. Accounting for the designed ∆τ
and proposed baseband TTD implementation, we compare
the three architectures in terms of power consumption in
Section V.
III. TTD BEAM TRAINING ALGORITHM DESIGN
In this section, we describe a DSP algorithm which achieves
a high angle estimation accuracy using only one pilot symbol
in a clustered frequency-selective multipath channel. An ex-
ample of such channel with one strong and one weak cluster,
as seen by the receiver, is provided in Fig. 4(a). Conventional
phased arrays cannot estimate the AoA of the dominant cluster
with one training pilot, and thus they require beam sweeping,
as illustrated in Fig. 4(b). On the other hand, TTD arrays are
capable of estimating the AoA fast, but the corresponding DSP
algorithm must include the design of a suitable TTD beam
training codebook.
4One channel observation
with TTD array (R=1) [20]
One channel observation
with TTD array (R=2)
AoA cannot 
be estimated 
with one pilot
True frequency-angular
response of the channel
AoA [deg]
Fr
e
q
u
e
n
cy
 [
H
z]
Fr
e
q
u
e
n
cy
 [
H
z]
AoA [deg] AoA [deg] AoA [deg]
AoA can be 
estimated 
with one pilot
Not robust
to fadingFr
e
q
u
en
cy
 [
H
z]
Fr
e
q
u
en
cy
 [
H
z]Cluster
#1
Cluster
#2
One channel observation
with phased array
(a) (b) (c) (d)
Deep 
fades
                    
Fig. 4. Beam training in clustered frequency-selective multipath channel: (a) An example of frequency-selective channel with two multipath clusters. Frequency-
selectivity comes from intra- and inter-cluster delay spreads. (b) Channel observation of a phased array when only one pilot is used. Beam sweeping is necessary
to cover all angles in the range (−pi/2, pi/2). (c) Channel observation of a TTD array when only one pilot is used. Frequency components (subcarriers) are
mapped into different angles to simultaneously probe the range (−pi/2, pi/2). The angle estimation may fail in frequency-selective channels. (d) Enhanced
TTD codebook with frequency diversity order R = 2.
A. System Model
We consider downlink beam training between the BS and
UE, where the cyclic prefix (CP) based orthogonal frequency-
division multiplexing (OFDM) waveform is used as a training
pilot. The carrier frequency, bandwidth, and number of sub-
carriers are denoted as fc, BW, and Mtot, respectively. The
power-normalized training pilot uses M subcarriers from the
predefined setM, all loaded with the same binary phase shift
keying modulated symbol. Both the BS and UE have half-
wavelength spaced uniform linear arrays with NT and NR
antennas, respectively. We assume that AoD at the BS has
already been estimated so that BS uses a fixed frequency-
flat beam defined by a precoder vector v ∈ CNT . The UE is
equipped with a TTD array and it performs beam training to
estimate AoA θˆ(R). Thus, the received signal Y [m] at the m-th
subcarrier is
Y [m] = wH[m]H[k]v + wH[m]n[m], m ∈M, (1)
where H[k] ∈ CNR×NT is the channel matrix of the k-th
out of Kc sub-bands in a frequency-selective channel and
n ∼ CN (0, σ2NINR) is white Gaussian noise. Each sub-band
contains multiple adjacent sub-carriers, and the relationship
between the sub-band index k and subcarrier index m is given
as k = d(mKc)/Mtote, where dxe rounds x to the nearest
greater integer. We assume that all subcarriers within the same
frequency sub-band k experience the same channel H[k]. The
TTD combiner w[m] ∈ CNR of the m-th subcarrier can be
decomposed as a Hadamard product w[m] = wA[m]wD[m],
where the analog combiner wA[m] ∈ CNR and digital com-
biner wD[m] ∈ CNR depend on the underlying TTD array
architecture. In an analog TTD array, wD[m] = 1NR since
both the phases φA,n, ∀n, and delays τA,n, ∀n, are introduced
in the analog domain. Similarly, wA[m] = 1NR with a fully
digital array, as it is insensitive to hardware impairments and
the phases φD,n, ∀n, and delays τD,n, ∀n, are applied in the
digital domain. In general, the n-th elements of wA[m] and
wD[m] are
[wA[m]]n = exp
[
−j
(
2pi(fm − fc)τ˜A,n + φ˜A,n
)]
(2)
[wD[m]]n = exp [−j (2pi(fm − fc)τD,n + φD,n)] (3)
where fm = fc − BW/2 + (m− 1)BW/(Mtot − 1).
The expressions (2) and (3) indicate that the beam pointing
direction depends on the subcarrier frequency, phases, and
delays. With a proper configuration of the phase and delay
taps in the analog and/or digital domain, it is possible to set
up a codebook of combiners that covers all angular directions,
as we discuss in the next subsection.
B. DSP Algorithm for Beam Training
In this subsection, we first present the design of a robust
codebook and then describe a DSP algorithm for TTD arrays
[24] that achieves a high resolution in AoA estimation.
In [20], we have demonstrated that D spatial directions
in the angular range (−pi/2, pi/2) can be sounded with a
single OFDM symbol by mapping one subcarrier per direc-
tion, as illustrated in Fig. 4(c). We have shown that this
can be achieved with an analog TTD array by setting the
delay spacing to be ∆τ = 1/BW. The resulting codebook
is, however, sensitive to frequency-selective channels since
certain subcarriers can experience deep fades and thus miss to
detect the incoming signal. The codebook can be enhanced by
increasing its frequency diversity order R, i.e., by mapping R
distinct subcarriers in each probed direction [24]. The benefit
of the enhanced codebook is illustrated in Fig. 4(d) for R = 2,
where two subcarriers detect the dominant cluster. To increase
the diversity, we define D distinct sets Md, 1 ≤ d ≤ D, of
R subcarriers, where each set is associated with a different
direction d, 1 ≤ d ≤ D. Mathematically, the R subcarriers
from the set Md have the same combiner fd, i.e., w[m] =
fd, ∀m ∈Md, where the n-th element of fd is defined as
[fd]n = exp[−j2pi(n− 1)(d− 1−D/2)/D], d ≤ D. (4)
The subcarriers in Md, ∀d, however, should experience
different channels, and thus we choose them uniformly across
the bandwidth with the step size larger or equal than the
coherence bandwidth (channel sub-band size). This codebook
can be created for an analog TTD array by setting the n-th
phase and delay taps as follows
φA,n = (n− 1)[sgn(ψ)pi − ψ] (5)
τA,n = (n− 1)R/BW, (6)
where ψ = mod(2piR(fc − BW/2)/BW + pi, 2pi)− pi. sgn()
and mod() are the sign and modulo operators, respectively.
5Normalized BF gain [dB]
Fig. 5. An example of robust TTD codebook for NR = 16, D = 16,
and R = 4. All D = 16 directions are probed simultaneously. Direction
d, 1 ≤ d ≤ D, is associated with set of subcarriers Md and combiner fd.
The phase taps in (5) ensure that the first set of subcarriers
M1 is mapped into the first probed angle (−pi/2). An example
of the resulting codebook with NR = 16, D = 16, and R = 4
is provided in Fig. 5. Note that the same enhanced codebook
can be created for the hybrid TTD or fully digital array without
the need to implement a fractional ADC sampling since ∆τ
is proportional to the Nyquist sampling period, i.e., ∆τ =
R/BW. Analog and digital delay taps of the hybrid array
introduced in Section II, can be expressed with respect to the
indices of all antenna elements in the array n = 1, .., NR,
as τA,n = (n− 1− b(n− 1)/NrcNr) ∆τ , and τD,n = b(n −
1)/NrcNr∆τ , respectively. The operator bxc rounds x to the
nearest lower integer. Thus, the hybrid TTD array can create
the enhanced codebook by setting the n-th taps of its analog
and digital combiners in the following way
φA,n = (n− 1)[sgn(ψ)pi − ψ], (7)
τA,n = (n− 1− b(n− 1)/NrcNr)R/BW, (8)
τD,n = b(n− 1)/NrcNrR/BW, (9)
where ψ is defined as earlier. The result in (9) suggests
that the h-th sub-array needs to introduce a digital delay of
2(h − 1)NrR time samples, assuming the Nyquist sampling
frequency fs = 2BW. The considered hybrid array in Fig. 2
does not apply the phase changes in the digital domain. The
digital array can create the enhanced codebook by using the
following digital taps
φD,n = (n− 1)∆φ, ∆φ ∈ R (10)
τD,n = (n− 1)R/BW. (11)
The phase tap in (10) implies that the digital array can leverage
the DSP to introduce any number of phase spacings ∆φ. With
fs = 2BW, the n-th antenna branch will introduce the digital
delay of 2(n− 1)R time samples according to (11).
The phase and delay taps required for the design of a robust
codebook are summarized in Table I for all three arrays.
TABLE I
PHASE AND DELAY TAP SETTINGS FOR ROBUST CODEBOOK DESIGN
Array arch. w[m] φA,n τA,n φD,n τD,n
Analog TTD wA[m] (5) (6) N/A N/A
Hybrid TTD wA[m]wD[m] (7) (8) N/A (9)
Digital wD[m] N/A N/A (10) (11)
We note that the analog and hybrid TTD architectures have
the same limited flexibility of receive combining in beam
training. Namely, once their corresponding analog combiners
wA[m], m ∈ M, and digital combiners wD[m], m ∈ M
are set up, they cannot be further changed or manipulated in
DSP. In both architectures, this happens because the signals
from different antenna branches are completely or partially
combined before passing through ADCs. Thus, the inability to
rotate the combiners limits the number of sounded directions
to D in both arrays. The diversity order R is also limited,
but not necessarily the same in both arrays, as discussed later
in the paper. On the other hand, the digital array can exploit
digitized signals in all antenna branches and combine them
from many different directions in DSP by changing the phases
φD,n, ∀n. Different phases φD,n introduces angular shifts of
the entire codebook, and enable scanning more angles and/or
higher diversity.
We use the designed beam training codebook to develop
a non-coherent power-based DSP angle estimation algorithm.
Non-coherent algorithms are preferred in mmW beam training
as they avoid complex joint synchronization and beam training
receiver processing.
Since the subcarriers from Md, ∀d, experience different
channels, we can consider the received signal in all D probed
directions as random. In a clustered multipath channel, the vec-
tor of expected powers in D directions p = [p1, p2, ..., pD]
T
can be expressed as
p = Bg +NRσ
2
N1, (12)
where B ∈ RD×Q is a known dictionary obtained by general-
izing the UE BF gains in Q angles ξq, q = 1, ..., Q, for all D
combiners. The (d, q)-th element of B is defined as [B]d,q =
|fHd aR(ξq)|2, where aT(ξq) is the receive spatial response with
elements [aR(ξq)]n = N
−1/2
R exp(−j(n− 1)pi sin(ξq)), n =
1, ..., NR. The vector g ∈ RQ has only one non-zero element.
For a detailed derivation of (12), please refer to Appendix A.
During beam training, the estimates of pd, ∀d, are obtained
by averaging out the powers of all subcarriers from the
corresponding set Md, ∀d [24]. In fact, it can be shown that
the sample mean is the maximum likelihood (ML) estimator of
pd, ∀d. The vector of power estimates is denoted as pˆ, which
approximate p in (12). Based on the power measurement
model in (12), AoA estimation can be solved based on the
ML criterion using simple linear algebra operations. The AoA
θ(R) estimate is obtained by finding the index of the column
in B which has the highest correlation with pˆ, which is
mathematically expressed as
θˆ(R) = ξq? , where q? = argmax
q
pˆT[B]:,q
||[B]:,q|| . (13)
The proposed algorithm can achieve high AoA estimation
accuracy by increasing Q, i.e., the number of the columns
in the dictionary matrix B. Although this increases the DSP
complexity, the proposed beam training scheme can still be
performed with a single OFDM symbol. For the rest of
this paper, we use root mean square error (RMSE) of AoA
estimation and power consumption as main metrics for the
comparison of the proposed TTD architectures. The AoA
RMSE closely describes the beam training performance and
it can be directly converted to an alternative metric in other
6SCA
φ1NI σNI
φ11 σ1
BB1
 
T
o
 S
u
m
m
er
τ
M1
BBNR
B
B
(1
 t
o
 N
R
)
Time-Interleaved Delay Element 
φNR1 σ1
MNIM1
   
   
   
φ1NI φNRNI
   
   
   
      
   Σ
RST
  
φ11 φNR1
A
D
C
B
B
(1
 t
o
 N
R
)  1  NI
T
o
 S
u
m
m
er
B
B
(1
 t
o
 N
R
)
B
B
(1
 t
o
 N
R
)
(a)
Quad.
Phase
Gen.
CLK
I+
Q+
I-
Q-
I+
Q+
I-
Q-
2
1
4
3
IN0
IN90
IN180
IN270
θ
S-bit PI
Io
+
Qo
+
Io
-
Qo
-
2
1
4
3
IN0
IN90
IN180
IN270
θ
S-bit PI
Io
+
Qo
+
Io
-
Qo
-
Time 
Interleaver
 11  1NI
Time 
Interleaver
 NR1  NRNI
CLK
 11
 12
 1NI
 21
 NR1
σ1
RST
τ 
TS TS
TS/2
~ ~
~ ~
~ ~
~ ~
~ ~
(b)
Fig. 6. (a) Multiply-and-accumulate in discrete-time for TTD BF [22], [25]
(inset: switched-capacitor adder without the opamp) and (b) time-interleaved
clock generation unit (inset: example timing diagram).
applications, including the spectral efficiency in mmW data
communication and position error in localization.
IV. ARCHITECTURE PERFORMANCE ANALYSIS
In this section, we introduce and compare the baseband
implementation of analog TTD elements in analog and hybrid
TTD architectures. Then we study the impact of limited TTD
delay range in both architectures on beam training perfor-
mance and we explain the interplay between the number of an-
tenna elements NR, bandwidth BW, and diversity order R. We
also numerically evaluate the impact of hardware impairments
and ADC quantization error on the AoA estimation accuracy.
A. Baseband Implementation of Analog TTD Front-End
While TTD array operation is conceptually simple, its physical
implementation is non-trivial when targeting large delay range.
In general, implementing delays with large range-to-resolution
ratios is difficult without severe penalties in linearity, noise,
power and area besides increased design complexity. In an
array with baseband TTD elements, instead of delaying the
down-converted and phase shifted signals from the antennas,
sampling and digitization, the signals are sampled at different
time instants through the Switched-Capacitor Arrays (SCA)
circuit, resulting in the same digitized value. Thus, the com-
plexity of delaying signals is shifted to the clock path where
precise and calibrated delays can be applied in the advanced
semiconductor technology nodes. More importantly, a large
delay range-to-resolution ratio can be realized easily. The
SCA based implementation requires multiple time-interleaved
and delay-compensated phases for formation of the beam
as shown in Fig. 6(a) and discussed in detail in [22]. In
the sampling phase, the input signal from each channel is
first sampled (with delayed clocks) on a sampling capacitor
(CS). After the last sampling phase, the stored charges on
TABLE II
ANALOG TTD ARRAY COMPLEXITY WITH INCREASED DIVERSITY R.
R ∆τ
τA,NR NI τA,N r NI
Analog Analog Hybrid Hybrid
1 0.5 ns 7.5 ns 31 1.5 ns 7
2 1 ns 15 ns 61 3 ns 13
4 2 ns 30 ns 121 6 ns 25
Assumed parameters are NR = 16, fCLK = 4 GHz, BW = 2 GHz. Hybrid
TTD array has four 4-element sub-arrays (Nr=4).
each capacitor corresponding to each channel (and each time-
interleaved phase) are summed to form the beam.
The proposed beam-training algorithm requires wider delay
ranges with delay offsets that are integer multiples of ∆τ . This
significantly relaxes the design requirement of the SCA and
the clock path for TTD-based beam-training. Larger delay-
bandwidth products can thus be realized using passive SCA
whose performance will not be limited by the opamp feedback
factor or time-based circuits as demonstrated in our recent
work in [23]. Ongoing research is also investigating use of
high-linearity and high-speed ring amplifiers [26] in the SCA.
Fig. 6(b) shows the clock generation circuit. The proposed
beam-training just requires a time-interleaver applied to the
input clock. The output of the time-interleaver is applied
to interleaved multiply-and-accumulate units (MAC) in the
SCA (=NI) and enables the SCA to span the required delay
range while meeting the Nyquist BW. The same circuit can
be extended for data communication with the only addition
being a multi-bit phase interpolator (PI) as described in [25].
In Fig. 6(b), the external single-phase clock (CLK) is first
fed to a quadrature phase generator circuit. The quadrature
outputs (I-, I+, Q-, Q+) of each phase generator are further
fed to the S-bit PI. The quadrature output is then applied to a
multiplexer (MUX) which helps in spanning the angular range
(−pi/2, pi/2). An example timing diagram is also shown in
Fig. 6(b) with NR = 4 and NI = 7 for R = 1 in a hybrid
array. A total of 36 phases are shown at the time-interleaver
output with a 12.5% pulse width.
We further analyze the number of interleaving levels that
are required in analog and hybrid TTD arrays. Considering
NI as the interleaving factor in the analog TTD array (Fig. 1),
the maximum achievable delay compensation TC-max is
TC-max = (NI − 1)Ts = (NI − 1)/fs (14)
where Ts and fs are the reference clock period and sampling
frequency respectively. To cover the entire angular range in
beam training, TC-max should be equal to τA,NR . Substituting
(6) in this equality and solving for Ts yields
Ts = (NR − 1)R/ ((NI − 1)BW) (15)
Considering a heterodyne receiver architecture and perfect
sampled signal reconstruction satisfying the Nyquist condition
(i.e., Ts ≤ 1/(2BW)), NI can be derived to be
NI ≥ 1 + 2R(NR − 1). (16)
Equation (16) can be further applied for hybrid arrays substi-
tuting NR with NR/NH.
70.5 1 1.5 2 2.5 3 3.5 4 4.5
BW [GHz]
10-1
100
101
Ao
A 
R
M
SE
 [d
eg
ree
]
R = 2
R = 1
R = 8
R = 8R = 4
R = 4
R = 1
R = 2
R = 2
R = 16
        Analog-16          Analog-32
        Hybrid-16           Hybrid-32
        Digital-16           Digital-32
R = 4
R = 2
R = 16
R
max
Fig. 7. Beam training performance comparison of the three considered
architectures and the interplay of R, NR, and BW.
Table II shows an example case study of the required
number of interleaving stages in the analog/hybrid TTD array
as a function of diversity order and the delay range. This
table uses (16) with a specific case of 2 GHz bandwidth,
4 GHz sampling frequency, and 16 antenna elements for both
the analog and hybrid array presented in Fig. 1 and Fig. 2
respectively.
B. Impact of Limited TTD Delay Range on Beam Training
In this subsection, we assume that the analog and hybrid
architectures have TTD elements with the same state-of-the-
art maximum delay compensation of TC-max = 15 ns, or
equivalently the same interleaving factor NI.
To realize the proposed beam training algorithm, τA,NR ≤
TC-max needs to be satisfied for the analog, and τA,Nr ≤ TC-max
for the hybrid TTD array. Based on these conditions, it is
straightforward to show that the achievable diversity order R
is limited as
1 ≤ R ≤ TC-max
NR − 1BW and 1 ≤ R ≤
TC-max
Nr − 1BW, (17)
for the analog and hybrid array, respectively. Note that with
R < 1, the beam training algorithm cannot be realized with
a single OFDM symbol. The expressions in (17) describe the
dependency of R on the basic system parameters NR, Nr,
and BW. In the remainder of this subsection, we numerically
evaluate the interplay among them.
We study the beam training performance of different archi-
tectures in terms of AoA estimation accuracy, assuming that
R is constrained to be maximal power of 2. We consider a
system with carrier frequency fc = 60 GHz, bandwidth values
in the range 0.5 GHz ≤ BW ≤ 4.5 GHz, and Mtot = 4096
subcarriers for any bandwidth. The transmitter array size is
NT = 128, while the receive array size can take values
NR = {16, 32}. There are Nr = 4 antennas in each sub-array
in hybrid TTD architecture, regardless of the total number of
antennas. The number of probed directions in beam training is
assumed to be D = 2NR and the dictionary size is Q = 1024.
The channel consists of L = 3 clusters, where one is 10 dB
stronger than the other two. Fading is simulated by 20 rays
within each cluster with up to 10 ns spread. There is no intra-
cluster angular spread. Pre-beamforming signal-to-noise ratio
0 5 10 15 20 25 30 35 40 45
Standard deviation of phase error P [degree]
10-1
100
101
102
Ao
A 
R
M
SE
 [d
eg
ree
]
Analog - phase error
Hybrid - phase error
0 50 100 150 200 250 300 350
Standard deviation of delay error T [ps]
10-1
100
101
102Analog - w/o error
Hybrid - w/o error
Digital - w/o error
Analog - delay error
Hybrid - delay error
Fig. 8. Beam training performance comparison of the three considered
architectures under the distorted delay taps τ˜n ∼ N
(
τn, σ2T
)
, ∀n, and phase
taps φ˜n ∼ N
(
φn, σ2P
)
, ∀n. The curves with the delay error (dashed with
stars) and phase error (dashed with diamonds) are associated with the upper
and lower x-axis, respectively.
(SNR) is defined as SNR ,
∑L
l=1 σ
2
l /σ
2
N, and it is assume to
be SNR = −20 dB.
In Fig. 7, we present the results for the beam training
performance and the interplay of the considered parameters.
In both cases NR = 16 and NR = 32, the analog TTD
array architecture has the highest RMSE of AoA estimation
due to low achievable diversity order R. As discussed earlier,
analog arrays have large delay range requirements, and thus
better estimation accuracy (equivalently, higher R) requires
larger BW. Similarly, increasing the array size NR can have
a positive effect on the performance. However, if BW is
not large enough and there is no diversity (R = 1), larger
arrays do not improve the estimation accuracy in frequency-
selective channels. The analog arrays do not have the results
for the values of BW for which the proposed single-shot beam
training cannot be realized (R < 1). In hybrid TTD arrays,
higher diversity orders can be utilized since Nr < NR, which
leads to better estimation accuracy compared to analog arrays.
Increase in the number of antenna elements does not change
achievable R in hybrid arrays since we assume that Nr = 4
remains constant. It does, however, improve the estimation
accuracy of hybrid arrays, which approaches the sub-degree
performance of fully digital arrays. Since R can be maximized
through DSP in digital arrays, their performance is indepen-
dent of BW. The floor of the AoA RMSE is determined by
the dictionary size Q = 1024. Based on described results in
Fig. 7, one can predict the diversity order R and beam training
performance for any considered array architecture, given the
system parameters BW, NR, and TC-max.
C. Impact of TTD Hardware Impairments on Beam Training
Next, we study the impact of practical TTD hardware
impairments and ADC quantization errors on beam training
in all considered architectures. Here we keep AoA RMSE as
the performance metric and use the same system parameters
as in the previous subsection. We consider a specific case with
NR = 16 and BW = 2 GHz.
In Fig. 8, we study the beam training performance under the
phase and delay errors. Unlike analog and hybrid TTD arrays,
81 2 3 4 5 6 7 8
ADC bit resolution
10-1
100
Ao
A 
R
M
SE
 [d
eg
ree
]
Analog - w/o error
Hybrid - w/o error
Digital - w/o error
Analog - quant. error
Hybrid - quant. error
Digital - quant. error
Fig. 9. Beam training performance comparison of the three considered
architectures under different ADC resolutions.
fully digital array is not sensitive to these hardware impair-
ments and we include its performance with the maximum R as
the benchmark. With the considered system parameters, analog
TTD array has the diversity order R = 2, which limits its
angle estimation accuracy and robustness to hardware errors.
We can see that the beam training algorithm can tolerate phase
errors with the standard deviation of up to σP = 15◦ and delay
errors with the standard deviation of up to σT = 75 ps. Hybrid
TTD array achieves a lower estimation accuracy and greater
robustness to delay and phase errors than analog TTD array
since it leverages the diversity order R = 8 in beam training.
It can tolerate large phase errors and delay errors with the
standard deviation larger than σT = 200 ps. It is worth noting
that the delay errors in hybrid arrays are independent of the
reduced delay taps in the corresponding TTD elements.
In Fig. 9, we present how finite ADC resolution affects the
beam training performance with different array architectures.
For fair comparison, we assume that the automatic gain control
(AGC) outputs a unit-variance signal in all architectures. We
can observe that the AoA estimation accuracy of the analog
TTD array with a single RF-chain is marginally affected
by low ADC resolution. On the other hand, low resolution
ADCs have a noticeable impact on beam training with the
hybrid TTD and fully digital arrays, as combined quantization
errors from different RF-chains deteriorate the estimation
accuracy. We note, however, that the deteriorated accuracy
is still within the sub-degree range and lower than that of
the analog array. Our results indicate that practical mmW and
sub-THz transceivers may require ADCs with only a few bits
of resolution for effective beam training. For example, with
only 3-bit resolution, the performance loss in negligible in
any array. Low-resolution ADCs have a positive impact on the
overall power efficiency of the considered TTD architectures,
as discussed in the next section.
V. POWER ANALYSIS OF TTD ARCHITECTURES
This section presents power analysis of the analog and
hybrid TTD arrays comparing it with a digital array for the
proposed mmW beam training algorithm in Section III-B.
We will estimate the power consumption of the baseband
components in the signal chain in Fig. 1, Fig. 2, and Fig. 3
for the analog, hybrid, and digital arrays assuming the mmW
front-end consumes the same power in all the three array
TABLE III
STATE-OF-THE-ART LOW-RESOLUTION GHZ ADCS.
Parameters [28] [29] [30]
Sampling Rate (fs)(MHz) 2500 2000 5
ENOB (bit) 6 7.93 4.06
Power (µW) 7500 21000 78000
FoM (fJ/c-s) 74.7 119 94.6
Technology (nm) 65 65 65
architectures. The only exception to this assumption in the
front-ends of the three array architectures is the phase-shifter.
In the analog/hybrid TTD array , the phase shifter precedes the
downconverting mixer whereas for the digital array it can be
implemented after the ADC. For the sake of comparison in this
work, we consider the digital phase-shifter power equivalent
to that of the RF/LO phase-shifter which will be assumed to
have a complete passive implementation [27]. The estimation
methodology for the remaining components of the hybrid and
the digital arrays follows that of the analog TTD array as
described in the next subsections. For each component, we
also have provided an example based on Table II.
A. Power Consumption of Analog/Hybrid TTD Array
Referring Fig. 1 (Fig. 2), this subsection will estimate the
power consumption of the ADC, AGC, SCA, and the time-
interleaving blocks that differentiates the power consumption
in the analog (hybrid) arrays.
1) Analog-to-Digital Converter (ADC): We estimate the
ADC power consumption using figure-of-Merit (FoM) derived
from recent works [28]–[30] on low-resolution high-speed
flash ADCs (different ADC configuration can be selected when
considering efficiency). Using the FoM of the state-of-the-
art flash ADCs from Table III, we take the average FoM of
96.1fJ/c-s for our estimation. For a 3-bit ENOB, fs=4GHz and
a FoM of 96.1fJ/c-s, the estimated power is thus 3.07mW.
In addition to the ADC power consumption, we also esti-
mate the deserializer power that is needed to interface the high-
speed ADCs with the backend DSP. Though insignificant for
analog and hybrid arrays, it will be an important contributor
for digital arrays. We consider here the DSP operating at 1GHz
and estimate the deserializer power consumption. From [31],
excluding the power of clock generator, the scaled deserial-
izer power for one unit (PDESo) is found to be 0.512 mW
(= 3.2 × 4/25) which yields 1.5mW and 6mW of power
consumption in analog and hybrid array respectively.
2) Switched-Capacitor Array (SCA): The SCA power con-
sumption is dominated mostly by the feedback operational
transconductance amplifier (OTA). We estimate the OTA
power consumption for an analog array similar to the method
in [22]. The DC gain (A0) and the unity-gain bandwidth (ωu)
requirements of the OTA used in the SCA are found to be:
ωu = 2 ln(2)NR(x+ 1)fs
where x is the ADC resolution and fs is the ADC sampling
frequency.
The normalized unity-gain bandwidth (ωu0) per unit sam-
pling frequency can be written as: ωu0 = 2 ln(2)(x + 1).
9For a 3-bit ADC (referring Fig. 9), the normalized unity-
gain frequency ωu0 = 2 ln(2)(3 + 1) =5.54 Hz. Neglecting
parasitics, second order effects, and considering a two-stage
internally compensated OTA, the transconductance of this OTA
can be designed to be linearly dependent to the DC current.
As a result, the DC gain of the OTA is independent of its DC
current and power consumption POTA. At the same time, ωu is
a linear function of the OTA transconductance, and thus varies
proportionally to POTA. Given these assumptions, the minimum
requirement on the OTA ωu results in linear dependency of
POTA to the product of the number of antennas and sampling
frequency, as shown below [22]:
POTA ≈ POTAoNRfs
where POTAo is the power consumption of an OTA designed
for a single-element array with unit sampling frequency (1Hz).
Solving for a 60◦ phase margin (PM) requirement puts Cc
close to 0.22 pF yielding gmn = 0.22 pF× 5.54 = 1.2188 ps.
Assuming gm/ID = 15, the unit current can be obtained as
IDn = 8.1253 × 10−14 A. For a 60◦ PM, the gmn for the
second stage is around 10 times of the first stage and we further
assume the same gm/ID ratio. The total current is thus (2 +
10)IDn = 9.7504 × 10−13A. Assume a 1V supply, the POTAo
can be estimated as 9.7504×10−13W. For the 16-antenna array
and fs = 4GHz (Table II), the estimated power consumption is
thus 62.403 mW. Note that the hybrid TTD array relaxes the
OTA power consumption per sub-array where POTA is scaled
by NR/NH. The same power consumption estimation however
applies to a digital array without any relaxation.
The power consumption of the AGC can also be estimated
using, PAGC. Assuming PAGC consumes the same power as
the OTA, the estimated total PAGC is also equal to 3.9mW for
analog arrays and 15.6mW for the hybrid array following the
design specifications in Table II.
3) Time-Interleaver: The power consumption for the time
interleaver can be estimated as [32]:
PTINW = fs/NI ×NI × (Csw/NI + Cint)× VDD2
where Csw is the switch capacitance and Cint is the intercon-
nect parasitic capacitance. For a sampling frequency of 4GHz
and 1V supply, Csw = 2.5pF, Cint = 0.6pF [32], 31 levels of
time interleaving in analog array, and 7 levels of interleaving
in a hybrid array, the estimated power consumption of the time
interleaver is 2.7mW and 3.8mW for the analog and digital
hybrid arrays respectively.
B. Power Consumption of Digital Array
The estimated power consumption for digital TTD array
can be derived following a similar approach to the analog
arrays with the important consideration that the proposed beam
training algorithm will require only integer delays at the ADC
sampling frequency. For operation in communication mode,
fractional-rate samplers will be needed as detailed in [33]. In
addition to the same number of ADCs, AGCs and filters as in
an analog TTD array, the digital array consumes higher power
TABLE IV
POWER ESTIMATION METHODOLOGY FOR TTD ARRAYS.
# Components Analog Hybrid Digital
ADC 1 NH NR
SCA/AGC 1 NH NR
PSCA POTAoNRfs POTAoNR/NHfs POTAoNRfs
PADCx−bit FoM based estimation
PAGC POTAofs POTAoNHfs POTAoNRfs
PDeSer PDESofsx PDESoNHx PDESoNRx
0
50
100
150
200
250
300
350
400
450
500
550
Po
w
er
 c
on
su
m
pt
io
n 
[m
W
]
Un
re
al
iz
ab
le
ADC
SCA
AGC
Time-interleaver
Deserializer
A16
H16
D16
H32A32
D32
D32
H32
A32
D16
H16
A16
BW = 4 GHzBW = 2 GHz
Fig. 10. Comparison of analog (A), hybrid (H), and digital (D) architectures
in terms of power consumption for NR = {16, 32} and BW = {2, 4}GHz.
at the ADC-DSP interface primarily due to the need for de-
serializing the high-speed ADC output. For example, with 16-
elements and 3-bit per ADC, the estimated power consumption
of the deserializer will be 24.6mW.
C. Comparison of Estimated Array Power Consumption
Table IV summarizes the required number of components
and power consumption in the analog, hybrid, and digital
arrays based on the architectures in Fig. 1, Fig. 2, and Fig. 3,
respectively. Fig. 10 illustrates the introduced power estimation
methodology with a breakdown of individual components for
the analog and hybrid TTD arrays and also the benchmark
digital array. The estimated power consumption for each
component block is described in the previous subsections
for each array architecture. The analog array provides high
energy efficiency as compared to the hybrid TTD array and
digital arrays. However, the increasing bandwidth as well as
the number of elements requires larger unity-gain bandwidths
OTAs which increases design complexity for higher diversity
orders. The need for higher unity-gain bandwidths is further
constrained with increasing number of feedback to the virtual
ground of the OTA. Hybrid arrays are thus the most optimal
choice to meet feasible delay-bandwidth products. Future work
will investigate design of analog (hybrid) arrays with higher
number of antenna elements per sub-array using passive SCA
that leverages the reasonably lower resolutions (≈ 3-bit)
required by the proposed beam training algorithm.
VI. CONCLUSIONS AND FUTURE WORK
This work introduced and analyzed two TTD architectures
with large delay-bandwidth product baseband delay elements
10
as potential candidates for mmW beam training. We demon-
strated that a high AoA estimation accuracy can be achieved
with both proposed TTD architectures using a power mea-
surement based beam training scheme, which requires only
one wideband training pilot. The dependency of the codebook
design and beam training performance on system parameters,
including the bandwidth, number of antenna elements, and
maximum TTD delay compensation, was analyzed and nu-
merically evaluated in a practical multipath fading channel.
Detailed analysis of the angle estimation accuracy, robustness
to hardware impairments, and power consumption, revealed
the trade-offs between the proposed TTD architectures when
benchmarked against the digital array. The analog TTD array
consumes 66% less power than the digital array, but it achieves
a higher angle estimation error. The hybrid TTD array has a
comparable beam training performance and 25% lower power
consumption than the digital array. The results on how power
consumption scales with the key system parameters, including
the bandwidth and array size, provided an insight into the
beam training design for future mmW and sub-THz systems.
Future work will include array implementations supporting
larger delay-bandwidth products for arrays with higher number
of antenna elements, as well as channel estimation and iden-
tification of multiple AoAs in interference-limited networks.
APPENDIX A
DERIVATION OF EXPECTED POWERS IN D DIRECTIONS
We consider a frequency-selective multipath channel with
L clusters and corresponding gains modeled as Gl ∼
CN (0, σ2l ). The channel gains are assumed to be independent
over different clusters and frequency sub-bands. The frequency
domain channel model can be approximated as
H = ARΛA
H
T , (18)
where AR ∈ CNR×Q and AT ∈ CNT×Q contain Q
array responses aR(ξq) and aT(ξq) that correspond to Q
uniformly spaced angles ξq, q = 1, ..., Q, in the range
(−pi/2, pi/2). The elements of array responses are defined as
[aR(θ)]n = N
−1/2
R exp(−j(n− 1)pi sin(θ)), n = 1, ..., NR,
and [aT(θ)]n = N
−1/2
T exp(−j(n− 1)pi sin (θ)), n =
1, ..., NT. The square matrix Λ ∈ CQ×Q has only L non-
zero elements that correspond to the gains Gl, ∀l. Commonly,
Q L and the approximation error in (18) can be neglected.
With the codebook design described in Section III-B, the
received signal in any sounded direction d can be considered as
a zero-mean complex Gaussian random variable and expressed
as
Yd = f
H
d Hv + f
H
d n, (19)
where n ∼ CN (0, σ2NIR) is white Gaussian noise. The
realizations of (19) are received symbols Y [m], m ∈ Md.
The expected received signal power in direction d is pd =
E
[
|Yd|2
]
= E[(fHd HvM−1/2 + fHd n)H(fHd HvM−1/2 + fHd n)].
Based on the channel model in (18), it can be shown that
pd = M
−1E
[
vHATΛ
∗AHRfdf
H
d ARΛA
H
T v
]
+ E
[
nHfdf
H
d n
]
.
(20)
We apply the trace operator Tr() to (20) and exploit its
linearity and cyclic property to obtain
pd = M
−1E
[
Tr
(
ΛAHT vv
HATΛ
∗AHRfdf
H
d AR
)]
+NRσ
2
N
= Tr
(
GAHRfdf
H
d AR
)
+NRσ
2
N. (21)
where G = E
[
ΛAHT vv
HATΛ
∗]. Since Λ and Λ∗ are sparse
matrices,
[
ΛAHT vv
HATΛ
∗] yields another sparse Q × Q
matrix with L2 non-zero elements. There are L non-zero
elements of the form |Gl|2|aHT (θ(T )l )v|2, ∀l, on the main
diagonal. The L(L− 1) off-diagonal elements are cross terms
Gl1G
∗
l2
aHT (θ
(T )
l1
)vvHaT(θ
(T )
l2
), ∀l1, l2. Thus, G is a diagonal
matrix with L non-zero elements σ2l |aHT (θ(T )l )v|2, ∀l, since
E
[
Gl1G
∗
l2
]
= 0, ∀l1 6= l2, and E
[
|Gl|2
]
= σ2l , ∀l. The
product of G and the matrix of the UE BF gains AHRfdf
H
d AR
is a Q × Q matrix whose diagonal elements are equal to
|fHd aR(ξq)|2[G]q,q, so (21) becomes
pd = b
T
d g +NRσ
2
N (22)
where bd =
[
|fHd aR(ξ1)|2, |fHd aR(ξ2)|2, ..., |fHd aR(ξQ)|2
]T
and
g = diag(G). By vectorizing the result in (22), we obtain
p = Bg +NRσ
2
N1, (23)
where p = [p1, p2, ..., pD]
T and B = [b1,b2, ...,bD]
T . Since
the BS provides a large BF gain with the fixed precoder v, we
can assume that receiver array sees only one spatially filtered
dominant cluster, e.g., the first one. Consequently, there is only
one non-zero element in g equal to |aHT (θ(T)1 )v|2σ21 .
REFERENCES
[1] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K.
Soong, and J. C. Zhang, “What will 5G be?” IEEE Journal on Selected
Areas in Communications, vol. 32, no. 6, pp. 1065–1082, June 2014.
[2] T. S. Rappaport, Y. Xing, G. R. MacCartney, A. F. Molisch, E. Mellios,
and J. Zhang, “Overview of millimeter wave communications for fifth-
generation (5G) wireless networkswith a focus on propagation models,”
IEEE Transactions on Antennas and Propagation, vol. 65, no. 12, pp.
6213–6230, 2017.
[3] K. Witrisal, P. Meissner, E. Leitinger, Y. Shen, C. Gustafson, F. Tufves-
son, K. Haneda, D. Dardari, A. F. Molisch, A. Conti, and M. Z.
Win, “High-accuracy localization for assisted living: 5G systems will
turn multipath channels from foe to friend,” IEEE Signal Processing
Magazine, vol. 33, no. 2, pp. 59–70, 2016.
[4] K. Hosoya, N. Prasad, K. Ramachandran, N. Orihashi, S. Kishi-
moto, S. Rangarajan, and K. Maruhashi, “Multiple sector ID capture
(MIDC): A novel beamforming technique for 60-GHz band multi-Gbps
WLAN/PAN systems,” IEEE Transactions on Antennas and Propaga-
tion, vol. 63, no. 1, pp. 81–96, 2015.
[5] C. Jeong, J. Park, and H. Yu, “Random access in millimeter-wave
beamforming cellular networks: issues and approaches,” IEEE Commu-
nications Magazine, vol. 53, no. 1, pp. 180–185, 2015.
[6] J. Kim and A. F. Molisch, “Fast millimeter-wave beam training with re-
ceive beamforming,” Journal of Communications and Networks, vol. 16,
no. 5, pp. 512–522, 2014.
[7] L. Zhou and Y. Ohashi, “Efficient codebook-based MIMO beamform-
ing for millimeter-wave WLANs,” in 2012 IEEE 23rd International
Symposium on Personal, Indoor and Mobile Radio Communications -
(PIMRC), 2012, pp. 1885–1889.
[8] D. Zhang, A. Li, H. Chen, N. Wei, M. Ding, Y. Li, and B. Vucetic,
“Beam allocation for millimeter-wave MIMO tracking systems,” IEEE
Transactions on Vehicular Technology, vol. 69, no. 2, pp. 1595–1611,
2020.
[9] H. Yan and D. Cabric, “Compressive initial access and beamforming
training for millimeter-wave cellular systems,” IEEE Journal of Selected
Topics in Signal Processing, vol. 13, no. 5, pp. 1151–1166, 2019.
11
[10] M. Bajor et al, “A flexible phased-array architecture for reception
and rapid direction-of-arrival finding utilizing pseudo-random antenna
weight modulation and compressive sampling,” IEEE J. Solid-State
Circuits, vol. 54, no. 5, pp. 1315–1328, May 2019.
[11] A. Ali, N. Gonzlez-Prelcic, and R. W. Heath, “Millimeter wave beam-
selection using out-of-band spatial information,” IEEE Transactions on
Wireless Communications, vol. 17, no. 2, pp. 1038–1052, 2018.
[12] V. Desai, L. Krzymien, P. Sartori, W. Xiao, A. Soong, and A. Alkhateeb,
“Initial beamforming for mmWave communications,” in 2014 48th
Asilomar Conference on Signals, Systems and Computers, 2014, pp.
1926–1930.
[13] C. N. Barati, S. A. Hosseini, S. Rangan, P. Liu, T. Korakis, and S. S.
Panwar, “Directional cell search for millimeter wave cellular systems,” in
2014 IEEE 15th International Workshop on Signal Processing Advances
in Wireless Communications (SPAWC), 2014, pp. 120–124.
[14] C. N. Barati, S. A. Hosseini, M. Mezzavilla, T. Korakis, S. S. Panwar,
S. Rangan, and M. Zorzi, “Initial access in millimeter wave cellular
systems,” IEEE Transactions on Wireless Communications, vol. 15,
no. 12, pp. 7926–7940, 2016.
[15] S. Kalia, S. A. Patnaik, B. Sadhu, M. Sturm, M. Elbadry, and R. Harjani,
“Multi-beam spatio-spectral beamforming receiver for wideband phased
arrays,” IEEE Transactions on Circuits and Systems I: Regular Papers,
vol. 60, no. 8, pp. 2018–2029, 2013.
[16] C. Yeh, T. Chu, C. Chen, and C. Yang, “A hardware-scalable DSP
architecture for beam selection in mm-Wave MU-MIMO systems,” IEEE
Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 11,
pp. 3918–3928, 2018.
[17] S. Blandino, G. Mangraviti, C. Desset, A. Bourdoux, P. Wambacq,
and S. Pollin, “Multi-user hybrid MIMO at 60 GHz using 16-antenna
transmitters,” IEEE Transactions on Circuits and Systems I: Regular
Papers, vol. 66, no. 2, pp. 848–858, 2019.
[18] N. J. Karl, R. W. McKinney, Y. Monnai, R. Mendis, and D. M.
Mittleman, “Frequency-division multiplexing in the terahertz range using
a leaky-wave antenna,” Nature Photonics, vol. 9, no. 11, p. 717, 2015.
[19] Y. Ghasempour, C.-Y. Yeh, R. Shrestha, D. Mittleman, and E. Knightly,
“Single shot single antenna path discovery in THz networks,” in
Proceedings of the 26th Annual International Conference on Mobile
Computing and Networking, ser. MobiCom 20. New York, NY,
USA: Association for Computing Machinery, 2020. [Online]. Available:
https://doi.org/10.1145/3372224.3380895
[20] H. Yan, V. Boljanovic, and D. Cabric, “Wideband millimeter-wave beam
training with true-time-delay array architecture,” in 2019 53rd Asilomar
Conference on Signals, Systems, and Computers, 2019, pp. 1447–1452.
[21] T. Chu and H. Hashemi, “A true time-delay-based bandpass multi-
beam array at mm-waves supporting instantaneously wide bandwidths,”
in 2010 IEEE International Solid-State Circuits Conference - (ISSCC),
2010, pp. 38–39.
[22] E. Ghaderi, A. Sivadhasan Ramani, A. A. Rahimi, D. Heo, S. Shekhar,
and S. Gupta, “An integrated discrete-time delay-compensating tech-
nique for large-array beamformers,” IEEE Trans. Circuits Syst. I, vol. 66,
no. 9, pp. 3296–3306, Sep. 2019.
[23] E. Ghaderi, C. Puglisi, S. Bansal, and S. Gupta, “10.8 A 4-
element 500MHz-modulated-BW 40mW 6b 1GS/s analog-time-to-
digital-converter-enabled spatial signal processor in 65nm CMOS,” in
2020 IEEE International Solid-State Circuits Conference (ISSCC), Feb.
2020.
[24] V. Boljanovic, H. Yan, E. Ghaderi, D. Heo, S. Gupta, and
D. Cabric, “Design of millimeter-wave single-shot beam training
for true-time-delay array,” arXiv.org, 2020. [Online]. Available:
https://arxiv.org/abs/2002.07849
[25] E. Ghaderi, A. Sivadhasan Ramani, A. A. Rahimi, D. Heo, S. Shekhar,
and S. Gupta, “A 4-channel MIMO baseband receiver array with
>35dB 80 MHz wideband spatial cancellation through true-time-delay
and truncated Hadamard transform,” accepted for publication at IEEE
Transactions on Microwave Theory and Techniques, pp. 1–11, 2020.
[26] B. Hershberg, S. Weaver, K. Sobue, S. Takeuchi, K. Hamashita, and
U. Moon, “Ring amplifiers for switched capacitor circuits,” IEEE
Journal of Solid-State Circuits, vol. 47, no. 12, pp. 2928–2942, Dec
2012.
[27] M. Elkholy, S. Shakib, J. Dunworth, V. Aparin, and K. Entesari, “Low-
loss highly linear integrated passive phase shifters for 5G front ends on
bulk CMOS,” IEEE Transactions on Microwave Theory and Techniques,
vol. 66, no. 10, pp. 4563–4575, 2018.
[28] D. Oh, J. Kim, D. Jo, W. Kim, D. Chang, and S. Ryu, “A 65-nm CMOS
6-bit 2.5-GS/s 7.5-mW 8 × time-domain interpolating flash ADC with
sequential slope-matching offset calibration,” IEEE Journal of Solid-
State Circuits, vol. 54, no. 1, pp. 288–297, 2019.
[29] S. Zhu, B. Wu, Y. Cai, and Y. Chiu, “A 2-GS/s 8-bit non-interleaved
time-domain flash ADC based on remainder number system in 65-nm
CMOS,” IEEE Journal of Solid-State Circuits, vol. 53, no. 4, pp. 1172–
1183, 2018.
[30] C. Chan, Y. Zhu, S. Sin, U. Seng-Pan, R. P. Martins, and F. Maloberti, “A
7.8-mW 5-b 5-GS/s dual-edges-triggered time-based flash ADC,” IEEE
Transactions on Circuits and Systems I: Regular Papers, vol. 64, no. 8,
pp. 1966–1976, 2017.
[31] J. W. Jung and B. Razavi, “A 25-Gb/s 5-mW CMOS CDR/deserializer,”
IEEE Journal of Solid-State Circuits, vol. 48, no. 3, pp. 684–697, 2013.
[32] B. Razavi, “Design considerations for interleaved ADCs,” IEEE Journal
of Solid-State Circuits, vol. 48, no. 8, pp. 1806–1817, 2013.
[33] S. Jang, R. Lu, J. Jeong, and M. Flynn, “A 1-GHz 16-element four-
beam true-time-delay digital beamformer,” IEEE Journal of Solid-State
Circuits, vol. 54, no. 5, pp. 1304–1314, May 2019.
