In this letter, an area-efficient FFT processor is proposed for MIMO-OFDM based SDR systems. The proposed FFT processor can support variable lengths of 64, 128, 256, 512, 1024, 1536 and 2048. By reducing the required number of non-trivial multipliers with a mixed-radix algorithm, the complexity of the proposed FFT processor is dramatically decreased. The proposed FFT processor was designed in a hardware description language (HDL) and synthesized to gatelevel circuits using a 0.13 μm CMOS standard cell library. With the proposed architecture, the gate count for the proposed FFT processor is 78.8 K and the size of memory is 393.22 Kbits, which are reduced by 40.9% and 19.7%, respectively, compared with the 4-channel radix-2 single-path delay feedback (R2SDF) with the 4-channel radix-3 SDF (R3SDF) FFT processor. Also, compared with the 4-channel radix-2 multi-path delay commutator (R2MDC) with the 4-channel R3SDF FFT processor, it is shown that the gate count and memory size are reduced by 33.8% and 18.5%, respectively.
Introduction
Software defined radio (SDR) systems have become a topic of great interest due to the need for reconfigurable and converged wireless communication systems [1] . In particular, an SDR system able to support both a WLAN mode indoors and a WiMAX/LTE mode outdoors has attracted increasing attention for its application in seamless wireless communications. Since all of these systems are based on a multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) scheme, which transmits multiple data streams simultaneously, its hardware complexity increases significantly compared with single-input single-output OFDM (SISO-OFDM) systems [2] . Particularly, since the fast Fourier transformation (FFT) processor is one of the largest modules in the OFDM baseband processor, it is very important to design a low-complexity FFT processor that can support variable lengths for an MIMO-OFDM based SDR system [3, 4, 5, 6] .
The FFT processor for an SDR system should perform 64/128-point FFT for IEEE 802.11n WLAN, 128/512/1024/2048-point FFT for IEEE 802.16e mobile WiMAX, and 128/256/512/1024/1536/2048-point FFT for 3GPP LTE systems. Also, the FFT processor should support the maximum four MIMO data streams simultaneously. In a previous study, we presented an efficient FFT processor based on mixed-radix multi-path delay commutator (MR-MDC) architecture [6] . Since the proposed FFT processor in [6] requires the minimum number of non-trivial multipliers, it can be implemented with very low complexity. However, it cannot support an LTE system because it cannot perform 1536-point FFT operation. In this letter, we propose an MR-3/4/2/4/2/4/2 algorithm for 1536-point FFT, and we present the hardware architecture of the FFT processor, which can support the variable lengths of 64, 128, 256, 512, 1024, 1536, and 2048.
Proposed MR-3/4/2/4/2/4/algorithm
An N -point discrete Fourier transform (DFT) can be expressed as
Replacing n and k with (2) and (3), (1) can be rewritten as (4) for N = 1536.
where
, and 0 ≤ n 6 , k 6 < 8.
where 1536 can be rewritten as
it can be calculated by using a TF of the 2048-point FFT. In addition, the TF W
1536 calculation can be shared with those in 2048-point FFT because the first two radix-2 stages in 2048-point FFT do not operate when calculating 1536-point FFT (See Fig. 1) . Therefore, the proposed MR-3/4/2/4/2/ 4/2 algorithm can be implemented with the MR-2/2/4/2/4/2/4/2 in [6] by just adding the radix-3 butterfly operation. As presented in [6] , the MR-2/ 2/4/2/4/2/4/2 algorithm requires the minimum number of non-trivial multipliers for the 64, 128, 256, 512, 1024 and 2048-point FFT processor. Therefore, by using the MR-3/4/2/4/2/4/2 algorithm and MR-2/2/4/2/4/2/4/2 algorithm, the proposed FFT processor can additionally support 1536-point FFT with minimum complexity.
3 Hardware architecture Fig. 1 shows the hardware architecture of the proposed FFT processor, which consists of radix-3 butterfly module (R3BM), radix-2 butterfly module1 (R2BM1), R2BM2, R2BM3, radix-4 butterfly module1 (R4BM1), R4BM2, data mapping module (DMM) and data re-ordering module (DRM).
The R3BM is used only for 1536-point FFT computation and implemented with the 4-channel R3SDF architecture as depicted in Fig. 2 . As already mentioned, the non-trivial multipliers and TF ROM are fully shared with R2BM2. TF ROM stores 1/8-cycle (0 ∼ π/4) of cosine signal and generates the TF by using the symmetric property of cosine and sine signals. The DMM, which is based on the delay commutating architecture with first-in first-out (FIFO) memory and switch unit, reconstructs the input data stream to be sized as an FFT length for MDC pipelining. The data re-arrange and switch pattern are the same as those illustrated in [6] . In the case of N = 2048, the reconstructed data are transferred to the R2BM1. However, if N = 1536, the reconstructed data enter the R4BM1 via the multiplexer unit, and in the case of N = 1024, the data are transferred to the first R2BM2. Similarly, the reconstructed data are transferred to the R4BM1, the second R2BM2, or the R4BM1, when N = 512, 128, or 64, respectively. In the case of N = 256, the reconstructed data are transferred to the first R2BM2, and the output data of the R2BM2 enter the second R2BM2. The hardware architectures of the R2BMs and R4BMs are the same as those presented in [6] . After the consecutive butterfly operations, the DRM, which is also based on the delay commutating architecture, finally re-orders the output data. 
Implementation results
The proposed FFT processor was designed in a hardware description language (HDL) and synthesized to gate-level circuits using a 0.13 μm CMOS standard cell library. Since an IEEE 802.11n WLAN system operates at a 20/40 MHz bandwidth, and IEEE 802.16e WiMAX and 3GPP LTE systems support a maximal bandwidth of 20 MHz, the proposed FFT processor was designed to operate at a clock frequency of 40 MHz. A 12-bit word-length for the real and imaginary data-path was selected to satisfy the requirement for a signalto-quantization noise-ratio (SQNR) of 40 dB for all FFT-lengths as depicted in Fig. 3 . With the proposed architecture, the logic gate count is 78.8 K, and the size of the required memory is 393.22 Kbits. In order to verify the efficiency of the proposed architecture, FFT processors with 1) 4-channel 2048-point R2SDF with 4-channel R3SDF architecture, and 2) 4-channel 2048-point R2MDC with 4-channel R3SDF architecture, which can support the FFT lengths of 64, 128, 256, 512, 1024, 1536, and 2048 like the proposed FFT processor, are also designed with a 12-bit word-length. Table I shows the comparison results for the logic gate count and memory size. In the case of 1), the processor includes the logic gates of 133.5 K and the memory of 489.98 Kbits, while the processor of 2) includes the logic gates of 119.1 K and the memory of 482.3 Kbits. Comparison results show that the proposed FFT processor is more area-efficient than other architectures.
After performing the layout for the proposed FFT processor, we also compare our work with the recent research results in [3, 4, 5] even though they do not support exactly the same specifications as that of the proposed processor. Table II summarizes the comparison results, where the normalized area is calculated by the method presented in [5] . As shown in Table II , only the proposed FFT processor can support the required FFT lengths for WLAN, WiMAX, and LTE systems. Also, the normalized area of the proposed FFT is smallest among the processors that can support the length of 2048, because the proposed FFT processor requires the minimum number of non-trivial multipliers. Although the area of [3] is shown to be a minimum, its word-length is 10-bits, and the maximum FFT-length is 1024.
Conclusion
In this letter, an area-efficient FFT processor is proposed for MIMO-OFDM based SDR systems. With the proposed MR-2/2/4/2/4/2/4/2 and MR-3/ 4/2/4/2/4/2 decomposition schemes and pipelined hardware architecture, the proposed FFT processor can support 4-channel 64, 128, 256, 512, 1024, 1536, and 2048-point FFT operation, which is needed for an SDR system that can support WLAN, WiMAX, and LTE modes. By using the proposed mixed-radix algorithm, the required number of non-trivial multiplications is also minimized. Implementation results show that the proposed FFT processor saves the logic gate count and memory size by 40.9% and 19.7%, respectively, compared with the conventional 4-channel R2SDF with the 4-channel R3SDF FFT processor. Compared with the 4-channel R2MDC with the 4-channel R3SDF FFT processor, it achieves reductions of 33.8% and 18.5%, respectively. It is also shown via a comparison with recent research results that the proposed processor can be implemented with very low complexity. Since the FFT processor is one of the largest modules in MIMO-OFDM systems, the proposed FFT processor would greatly contribute to the lowcomplexity implementation of MIMO-OFDM based SDR systems.
