Multi-Beam RF Aperture Using Multiplierless FFT Approximation by Suarez, D. et al.
ar
X
iv
:1
50
1.
01
94
6v
1 
 [s
tat
.M
E]
  8
 Ja
n 2
01
5
Multi-Beam RF Aperture
Using Multiplierless FFT Approximation
Dora Suarez∗ Renato J. Cintra∗ Fa´bio M. Bayer†
Arindam Sengupta‡ Sunera Kulasekera‡ Arjuna Madanayake‡
Fri 9th Jan, 2015 @ 8:20pm
Abstract
Multiple independent radio frequency (RF) beams find applications in communications, ra-
dio astronomy, radar, and microwave imaging. An N -point FFT applied spatially across an
array of receiver antennas provides N -independent RF beams at N
2
log
2
N multiplier complex-
ity. Here, a low-complexity multiplierless approximation for the 8-point FFT is presented for
RF beamforming, using only 26 additions. The algorithm provides eight beams that closely
resemble the antenna array patterns of the traditional FFT-based beamformer albeit without
using multipliers. The proposed FFT-like algorithm is useful for low-power RF multi-beam
receivers; being synthesized in 45 nm CMOS technology at 1.1 V supply, and verified on-chip
using a Xilinx Virtex-6 Lx240T FPGA device. The CMOS simulation and FPGA implementa-
tion indicate bandwidths of 588 MHz and 369 MHz, respectively, for each of the independent
receive-mode RF beams.
1 Introduction
Antenna array based radio frequency (RF) applications such as radar, wireless communications,
localization, remote sensing, signal intelligence, radio astronomy, search for extraterrestrial intelli-
gence (SETI), and imaging requires the fundamental operation of receive mode beamforming. To
wit, beamforming is precisely the directional enhancement of propagating electromagnetic planar-
waves based on their directions of arrival (DOA), whilst suppressing undesired noise and interference
that impinge on the antenna array. The ability to form multiple receiver beams is known as “multi-
beamforming” [1, 2]. Multiple RF beams, each having a unique “look direction”—the direction of
maximum sensitivity—is needed for multiple visibilities.
∗D. Suares and R. J. Cintra are with the Signal Processing Group, Departamento de Estat´ıstica, Universidade
Federal de Pernambuco. E-mail: rjdsc@dsp.ufpe.org
†F. M. Bayer is with the Departamento de Estat´ıstica and LACESM, Universidade Federal de Santa Maria, RS,
Brazil.
‡A. Sengupta, S. Kulasekera and A. Madanayake are with the ECE, The University of Akron, Akron, OH, USA.
1
DFT
LNA
ADC
vim
ψ
y
90◦
vreal
x
v7
V7V1V0
v0 v1
vn ∈ C
Figure 1: ULA-based multi-beamformer using a spatial FFT.
Multiple simultaneous beams are also needed for search-and-track radar, which in volume-
scan mode, continuously monitor airborne threats, such as aircraft, warheads and cruise missiles,
across a given range of angles. From the standpoint of high-capacity wireless communications,
simultaneous receiver beams are of importance to multi-input multi-output (MIMO) systems. The
application of an N -point fast Fourier transform (FFT)—at each time sample—spatially, across a
uniform linear array (ULA) of antennas, is a technique for achieving a plurality of independent RF
beams[1, 2]. The FFT efficiently computes the discrete Fourier transform (DFT) with N
2
log2N
multiplications. Fig. 1 shows an overview of a ULA-based multi-beamformer using a spatial FFT.
For an N -element ULA, the spatial FFT beamformer provides N beams, each uniformly spaced in
the frequency domain by the interval 2π/N . The signal is first sent through a low noise amplifier
(LNA) and the real (I, in-phase, vreal) and the imaginary (Q, quadrature, vim) parts are low-pass
filtered and sampled using analog-to-digital converters (ADCs), before application of the DFT. The
spatial angle ψ is the independent variable used in the polar array beam-patterns.
RF aperture power consumption is directly proportional to circuit complexity and clock fre-
quency. Because multiplier hardware dominates circuit complexity, the utilization of FFT hard-
ware having as small a number of parallel multiplier circuits as possible is preferable in terms
of reduction of overall circuit complexity and power consumption of the multi-beamformer. The
proposed fast algorithm approximates the FFT computation without using any multipliers at all,
making the corresponding digital architecture very simple to realize on-chip. Because the proposed
fast algorithm only requires 26 addition operation, the corresponding architecture is of lower power
consumption compared to usual FFT-based circuits having parallel multipliers to implement the
twiddle factors.
2 Multiplier-Free DFT Approximation
The DFT is a linear orthogonal transformation relating an N -point input vector v =[
v0 v1 . . . vN
]⊤
to an output vector denoted by V =
[
V0 V1 . . . VN
]⊤
according to
Vk =
∑N−1
n=0 vn · ωknN , k = 0, 1, . . . , N − 1, where ωN = exp {−2πj/N} is the Nth root of unity [3]
2
and j =
√−1. In matrix formalism, the above expression reduces to: V = FN ·v, where FN is the
DFT matrix, whose (i, k)-th element is given by fi,k = ω
ik
N , for i, k = 0, 1, . . . , N − 1. The direct
DFT computation requires N2 complex multiplications and N · (N − 1) additions. Thus, fast algo-
rithms are necessary and are often able to reduce the computation cost of the DFT computation
to O(N · log2N) multiplications [4].
We submitted the 8-point DFT matrix F8 to the parametric-based optimization method de-
scribed in [5] to derive a matrix approximation. Two major constraints were imposed on the sought
approximations: (i) near-orthogonality and (ii) low-complexity. Thus, we obtained that the optimal
elements for the parametric approximation of F8 are 1, (1− j)/2, and −j. Such parameters result
in the following matrix approximation:
Fˆ8 =
1
2
·


2 2 2 2 2 2 2 2
2 1−j −2j −1−j −2 −1+j 2j 1+j
2 −2j −2 2j 2 −2j −2 2j
2 −1−j 2j 1−j −2 1+j −2j −1+j
2 −2 2 −2 2 −2 2 −2
2 −1+j −2j 1+j −2 1−j 2j −1−j
2 2j −2 −2j 2 2j −2 −2j
2 1+j 2j −1+j −2 −1−j −2j 1−j

 .
Compared to the exact DFT matrix, above approximation has a mean squared error of 0.686,
which is considered low. Although not exactly orthogonal, the proposed approximation is very
close to orthogonality. Considering the deviation from orthogonality measure [6], the proposed
transform displayed a deviation of 0.03; whereas, in comparison, the popular non-orthogonal DCT
approximation SDCT [7] has a deviation from orthogonality of 0.20.
The proposed approximate matrix Fˆ8 preserves the symmetry of the DFT and has null multi-
plicative complexity. Still requiring 64 additions and 32 bit-shifting operations, a further reduction
in the additive complexity can be obtained by means of a tailored fast algorithm. Let In be the
identity matrix of order n and Bn =
[
1 1
1 −1
]⊗ In/2, where ⊗ denotes the Kronecker product. Thus,
employing the matrix factorization methods suggested in [4], we have the following fast algorithm:
Fˆ8 =P× diag
(
I2,A1,A3
)×D2 × diag (B2, I2,A4)
×D1 × diag
(
B4,A2
)×B8,
where A1 =
[
1 −1
1 1
]
, A2 =
[
1
1 1
1
1 −1
]
, A3 =
[
1 −1
−1 1
1 1
1 1
]
, A4 =
[
1 1
1 1
1 −1
1 −1
]
, D1 =
diag( 1, 1, 1, 1, 1, 1/2, 1, 1/2 ), D2 = diag( 1, 1, 1, j, 1, j, j, 1 ), P =
[
e1
∣∣
e5
∣∣
e3
∣∣
e6
∣∣
e2
∣∣
e8
∣∣
e4
∣∣
e7
]⊤
is a
permutation matrix, and ei is the 8-point column vector with element 1 at the ith position and
0 elsewhere. Figure 2 depicts the signal flow graph of the introduced algorithm. The arithmetic
complexity assessment in terms of real operations and comparisons are summarized in Table 1.
Each row i of matrix F8 may be interpreted as the coefficients of a discrete filter whose transfer
function is Hi(ω;F8) =
∑7
k=0 fi,k · exp(−jkω), i = 0, 1, . . . , 7, for ω ∈ [−π, π] [3]. In the case
of multi-beam forming, the exact or approximate DFT are applied spatially, across a ULA of
3
v7
v6
v5
v4
v3
v2
v1
v0
1/2
1/2
j
j
j
V5
V3
V7
V1
V6
V2
V4
V0
Figure 2: Signal flow graph for the factorization of Fˆ8. Input data vi, i = 0, 1, . . . , 7, relates to the
output Vk, k = 0, 1, . . . , 7. Dotted arrows represent multiplications by −1.
Table 1: Real operation assessment and comparison
Method Multiplications Additions Shifts
Exact DFT 256 240 0
FFT (complex input) [3] 4 52 0
FFT (real input) [3] 2 26 0
Proposed (complex input) 0 52 4
Proposed (real input) 0 26 2
4
antennas. Here, variable ω is the spatial frequency across the ULA. Let the normalized temporal
frequency of the incident plane wave be ωt ≤ π. From physics, we have that ω = −ωt sinψ,
for −π/2 ≤ ψ ≤ π/2, measured counter-clockwise from ULA broadside. We set ωt = π, which
corresponds to ψ ∈ [−π/2, π/2]. Thus, the array patterns are given by:
Pi(ψ;F8) =
|Hi(−ωt sin(ψ);F8)|
βi
,
where βi = maxψ |Hi(−ωt sin(ψ))|, for i = 0, 1, . . . , 7, is a normalization factor. Mutatis mutandis,
the array patterns based on the proposed approximation are denoted by Pi(ψ, Fˆ8), i = 0, 1, . . . , 7.
Figure 3(a)–(b) shows the pattern arrays associated to each row of F8 and Fˆ8. The eight inde-
pendent beams are pointed at angles ψk = 0.00,±14.47,±30.00,±48.59, 90.00 in degrees measured
from array broadside direction, as expected from the conventional DFT beamformer. To quantify
the difference between corresponding array patterns, we considered the following error function:
Di(ψ) ,
∣∣∣Pi(ψ;F8)− Pi(ψ; Fˆ8)
∣∣∣, i = 0, 1, . . . , 7.
In Figure 3(c), the polar plot of Di(ψ) for all rows of Fˆ8 is displayed. The error energy can be
obtained integrating Di(ψ):
ǫi =
∫ pi/2
−pi/2
D2i (ψ) dψ, i = 0, 1, . . . , 7.
This computation furnished ǫi = 1.08, for odd i, and ǫi = 0, for even i. The total error energy is
4.32. For comparison, the approximate DCT described in [8] has a total error energy of 4.12.
3 FPGA Realization and ASIC Synthesis
The proposed multiplierless architecture was realized on digital hardware using an ML-605 Xilinx
Virtex-6 field programmable gate array (FPGA) prototyping board. The design was built and
tested for 16-bit inputs via JTAG interface. Moreover, it was pipelined to minimize the critical-
path delay (Tcpd), which in turn offers the maximum frequency of operation and RF bandwidth.
The on-FPGA measured results verified the performance of the proposed architecture. The FPGA
resource consumption, including the number of slices, look-up tables (LUTs), and flip-flop (FF)
count, is presented in Table 2. The percentage utilization of the available resources is also shown.
The pipelined design offered a maximum frequency of 739 MHz corresponding to a maximum RF
bandwidth of 369 MHz for each of the eight beams.
The FPGA-based digital design was imported to Cadence RTL compiler for application-specific
integrated circuit (ASIC) synthesis using 45 nm complementary metal oxide semiconductor (CMOS)
technology, for an operating voltage of 1.1 V at 27◦C. Table 3 displays the area, power, critical path
5
0 0.5 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
(a) DFT-based
0 0.5 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
(b) Proposed
0 0.5 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
(c) Error
Figure 3: Polar plots of Pi(ψ;F8), i = 0, 1, . . . , 7, ψ ∈ [−π/2, π/2] at the frequency ωt = π for the
(a) exact transform F8, (b) proposed approximate transform Fˆ8, and (c) error measure Di(ψ).
Table 2: FPGA resource consumption
Resources Proposed
Slice Registers 3064 (1%)
Slice LUTs 2044 (1%)
Occupied Slices 620 (1%)
LUT-FF Pairs 2335 (1%)
Bonded IOBs 2 (1%)
Tcpd (ns) 1.353
Max. Frequency (MHz) 739.09
6
Table 3: ASIC synthesis results
Resources Proposed
Area (mm2) 0.064
Dynamic Power (mW) 94.18
Static Power (mW) 0.41
Total Power (mW) 94.59
Tcpd (ns) 0.85
Max. Frequency (GHz) 1.176
AT (mm2ns) 0.054
AT2(mm2ns2) 0.046
delay, and maximum frequency of operation, at synthesis stage. The area-time (AT) and area-time2
(AT2) complexities are reported. The CMOS synthesis shows an increase in the maximum clock
frequency when compared to its FPGA implementation.
4 Conclusion
An 8-point multiplierless DFT approximation requiring 26 additions was proposed. Applications
in receive mode RF multi-beamforming using a ULA of antennas include communication, radar,
and radio astronomy. CMOS synthesis and FPGA implementations have indicated bandwidths of
588 MHz and 369 MHz, respectively. The approximation is suitable for eight digital RF-beams, at
low power. The DFT approximation allows FFT-like performance without multiplier hardware.
Acknowledgements
We thank CNPq, FACEPE, FAPERGS, and The College of Engineering at UA for the partial
financial support.
References
[1] Ellingson, SW and Cazemier, W.: ‘Efficient Multibeam Synthesis with Interference Nulling for
Large Arrays’, IEEE Trans. Antennas and Propagation, 2003, 51, (3), pp. 503–511.
[2] Coleman, JO.: ‘A Generalized FFT for Many Simultaneous Receive Beams’, Naval Research
Lab., 2007, NRL/MR/5320–07-9029.
7
[3] Oppenheim, AV and Schafer, RW.: ‘Discrete-Time Signal Processing’, 3rd Ed, Prentice Hall,
2009.
[4] Blahut, RE.: ‘Fast algorithms for digital signal processing’, Cambridge University Press, 2010.
[5] Potluri, US, Madanayake, A, Cintra, RJ, Bayer, FM and Rajapaksha, N.: ‘Multiplier-free DCT
approximations for RF multi-beam digital aperture-array space imaging and directional sensing’,
Meas. Sci. Technol., 2012, 23 (11) 114003.
[6] Flury, BN and Gautschi, W.: ‘An algorithm for simultaneous orthogonal transformation of
several positive definite symmetric matrices to nearly diagonal form’, SIAM J. Sci. and Stat.
Comput., 1986, 7, (1), pp. 169–184.
[7] Haweel, TI.: ‘A new square wave transform based on the DCT’, Signal Process., 2001, 81, (11),
pp. 2309–2319.
[8] Bouguezel, S, Ahmad, MO and Swamy, MNS.: ‘Low-complexity 8×8 transform for image com-
pression’, Electron. Lett., 2008, 44, (21), pp. 1249–1250.
8
