Multiple independent radio frequency (RF) beams find applications in communications, radio astronomy, radar, and microwave imaging. An N -point FFT applied spatially across an array of receiver antennas provides N -independent RF beams at N 2 log 2 N multiplier complexity. Here, a low-complexity multiplierless approximation for the 8-point FFT is presented for RF beamforming, using only 26 additions. The algorithm provides eight beams that closely resemble the antenna array patterns of the traditional FFT-based beamformer albeit without using multipliers. The proposed FFT-like algorithm is useful for low-power RF multi-beam receivers; being synthesized in 45 nm CMOS technology at 1.1 V supply, and verified on-chip using a Xilinx Virtex-6 Lx240T FPGA device. The CMOS simulation and FPGA implementation indicate bandwidths of 588 MHz and 369 MHz, respectively, for each of the independent receive-mode RF beams.
Introduction
Antenna array based radio frequency (RF) applications such as radar, wireless communications, localization, remote sensing, signal intelligence, radio astronomy, search for extraterrestrial intelligence (SETI), and imaging requires the fundamental operation of receive mode beamforming. To wit, beamforming is precisely the directional enhancement of propagating electromagnetic planarwaves based on their directions of arrival (DOA), whilst suppressing undesired noise and interference that impinge on the antenna array. The ability to form multiple receiver beams is known as "multibeamforming" [1, 2] . Multiple RF beams, each having a unique "look direction"-the direction of maximum sensitivity-is needed for multiple visibilities. Multiple simultaneous beams are also needed for search-and-track radar, which in volumescan mode, continuously monitor airborne threats, such as aircraft, warheads and cruise missiles, across a given range of angles. From the standpoint of high-capacity wireless communications, simultaneous receiver beams are of importance to multi-input multi-output (MIMO) systems. The application of an N -point fast Fourier transform (FFT)-at each time sample-spatially, across a uniform linear array (ULA) of antennas, is a technique for achieving a plurality of independent RF beams [1, 2] . The FFT efficiently computes the discrete Fourier transform (DFT) with N 2 log 2 N multiplications. Fig. 1 shows an overview of a ULA-based multi-beamformer using a spatial FFT.
For an N -element ULA, the spatial FFT beamformer provides N beams, each uniformly spaced in the frequency domain by the interval 2π/N . The signal is first sent through a low noise amplifier (LNA) and the real (I, in-phase, v real ) and the imaginary (Q, quadrature, v im ) parts are low-pass filtered and sampled using analog-to-digital converters (ADCs), before application of the DFT. The spatial angle ψ is the independent variable used in the polar array beam-patterns.
RF aperture power consumption is directly proportional to circuit complexity and clock frequency. Because multiplier hardware dominates circuit complexity, the utilization of FFT hardware having as small a number of parallel multiplier circuits as possible is preferable in terms of reduction of overall circuit complexity and power consumption of the multi-beamformer. The proposed fast algorithm approximates the FFT computation without using any multipliers at all, making the corresponding digital architecture very simple to realize on-chip. Because the proposed fast algorithm only requires 26 addition operation, the corresponding architecture is of lower power consumption compared to usual FFT-based circuits having parallel multipliers to implement the twiddle factors.
Multiplier-Free DFT Approximation
The DFT is a linear orthogonal transformation relating an N -point input vector v =
. . , N − 1, where ω N = exp {−2πj/N } is the N th root of unity [3] and j = √ −1. In matrix formalism, the above expression reduces to: V = F N · v, where F N is the DFT matrix, whose (i, k)-th element is given by f i,k = ω ik N , for i, k = 0, 1, . . . , N − 1. The direct DFT computation requires N 2 complex multiplications and N · (N − 1) additions. Thus, fast algorithms are necessary and are often able to reduce the computation cost of the DFT computation to O(N · log 2 N ) multiplications [4] .
We submitted the 8-point DFT matrix F 8 to the parametric-based optimization method described in [5] to derive a matrix approximation. Two major constraints were imposed on the sought approximations: (i) near-orthogonality and (ii) low-complexity. Thus, we obtained that the optimal elements for the parametric approximation of F 8 are 1, (1 − j)/2, and −j. Such parameters result in the following matrix approximation:
Compared to the exact DFT matrix, above approximation has a mean squared error of 0.686, which is considered low. Although not exactly orthogonal, the proposed approximation is very close to orthogonality. Considering the deviation from orthogonality measure [6] , the proposed transform displayed a deviation of 0.03; whereas, in comparison, the popular non-orthogonal DCT approximation SDCT [7] has a deviation from orthogonality of 0.20.
The proposed approximate matrixF 8 preserves the symmetry of the DFT and has null multiplicative complexity. Still requiring 64 additions and 32 bit-shifting operations, a further reduction in the additive complexity can be obtained by means of a tailored fast algorithm. Let I n be the identity matrix of order n and B n = 1 1 1 −1 ⊗ I n/2 , where ⊗ denotes the Kronecker product. Thus, employing the matrix factorization methods suggested in [4] , we have the following fast algorithm:
where 1, 1 , j, 1, j, j, 1 ), P = e 1 e 5 e 3 e 6 e 2 e 8 e 4 e 7 ⊤ is a permutation matrix, and e i is the 8-point column vector with element 1 at the ith position and 0 elsewhere. Figure 2 depicts the signal flow graph of the introduced algorithm. The arithmetic complexity assessment in terms of real operations and comparisons are summarized in Table 1 .
Each row i of matrix F 8 may be interpreted as the coefficients of a discrete filter whose transfer function is H i (ω;
. In the case of multi-beam forming, the exact or approximate DFT are applied spatially, across a ULA of for −π/2 ≤ ψ ≤ π/2, measured counter-clockwise from ULA broadside. We set ω t = π, which corresponds to ψ ∈ [−π/2, π/2]. Thus, the array patterns are given by:
where β i = max ψ |H i (−ω t sin(ψ))|, for i = 0, 1, . . . , 7, is a normalization factor. Mutatis mutandis, the array patterns based on the proposed approximation are denoted by 
In Figure 3 (c), the polar plot of D i (ψ) for all rows ofF 8 is displayed. The error energy can be obtained integrating D i (ψ):
This computation furnished ǫ i = 1.08, for odd i, and ǫ i = 0, for even i. The total error energy is 4.32. For comparison, the approximate DCT described in [8] has a total error energy of 4.12.
FPGA Realization and ASIC Synthesis
The proposed multiplierless architecture was realized on digital hardware using an ML-605 Xilinx Virtex-6 field programmable gate array (FPGA) prototyping board. The design was built and tested for 16-bit inputs via JTAG interface. Moreover, it was pipelined to minimize the criticalpath delay (T cpd ), which in turn offers the maximum frequency of operation and RF bandwidth. count, is presented in Table 2 . The percentage utilization of the available resources is also shown.
The pipelined design offered a maximum frequency of 739 MHz corresponding to a maximum RF bandwidth of 369 MHz for each of the eight beams.
The FPGA-based digital design was imported to Cadence RTL compiler for application-specific integrated circuit (ASIC) synthesis using 45 nm complementary metal oxide semiconductor (CMOS) technology, for an operating voltage of 1.1 V at 27 • C. Table 3 displays (AT 2 ) complexities are reported. The CMOS synthesis shows an increase in the maximum clock frequency when compared to its FPGA implementation.
Conclusion
An 8-point multiplierless DFT approximation requiring 26 additions was proposed. Applications in receive mode RF multi-beamforming using a ULA of antennas include communication, radar, and radio astronomy. CMOS synthesis and FPGA implementations have indicated bandwidths of 588 MHz and 369 MHz, respectively. The approximation is suitable for eight digital RF-beams, at low power. The DFT approximation allows FFT-like performance without multiplier hardware.
