I. INTRODUCTION
The Fast Fourier transforms (FFT) and its inverse (IFFT) is one of the fundamental algorithms in the field of digital signal processing [1] , [2] . The FFT/IFFT is widely used in various areas such as telecommunications, speech and image processing, etc. Recently, this algorithm has been widely used as one of the key components of the Physical layer (PHY) of Orthogonal Frequency Division Multiplexing (OFDM) based wireless broadband communication system [3] . It's one of the highest computational complexity modules of several wireless standards PHY layer (OFDM-802.11a/g, MIMO-OFDM 802.11n, Hyperlan-2, and OFDM-802.16a/d/e) [2] .
Due to the increasing demand of OFDM-based applications and high throughput in Wireless Local Area Network (WLAN) and Wireless Metropolitan Area Network (WMAN), the processing speed, power and area implementation are very worthwhile and significant parameter in Fast Fourier Transform design [4] . Because of the complexity of the processing algorithm of FFT/IFFT and its importance in OFDM-based wide band communication systems, the study of algorithms methods and high-performance VLSI FFT/IFFT architectures is of increasing importance for improving the computation performance [5] , [6] .
Currently, the main restrictions for FFT/IFFT processors applied for OFDM-based wireless LAN systems are area, power consumption and execution time [4] . N-point FFT/IFFT design in IEEE 802.11a and IEEE 802.16a systems have to fulfill a time and power constrains to perform the computation of FFT/IFFT processor [7] , [8] . To meet those criterions, the FFT/IFFT design has to employ a highly in-depth architecture or use a very high operation frequency. One solution will satisfy the timing constraint, but at the cost of other constraints: the area and power consumption [1] , [4] .
In this paper, we simulated the multiplicative comparison of developed optimum FFT/IFFT design applied for N point processor, which has low arithmetic complexity and high architecture regularity, in the same time satisfies the timing and power consumption of IEEE 802.11a and IEEE 802.16a specifications. This design has been compared to others enhanced FFT/IFFT architectures. The efficient N-points FFT/IFFT design proposed in this paper, based on FPGA platform, gives an advantage in terms of multiplicative complexity, therefore a gain in resource usage and power dissipation using pipelined method and complex multiplication reduction approach.
II. OPTIMUM COMPLEXITY OF THE PROPOSED FFT/IFFT PROCESSOR

A. Architectural Description
The proposed architecture of N-points FFT/IFFT processor internally uses log r N modules of r-points FFT/IFFT for computation. The N-points-Radix-r decimation of the FFT/IFFT can be formulated in the following way:
We suppose: k=i + r.t and n=j + r.m where i, j є {0, 1,…, r-1} and m, t є {0, 1,…, r -1}.
The N-points FFT equation in (1) can be expressed as:
After simplifying, we obtain the equation (3): As demonstrated in formula (3), the implementation of the FFT algorithm for computation of the N-points FFT (i.e N=r 2 ) involves computation of two r-points FFT. This one can be decimated by using a Radix-r algorithm (Fig. 1 ).
The first r-points module computes r-points of the N-points FFT on the fitting data slot according to equation (3) and then multiplies the output with r×r inter-dimensional constants coefficients by a multiplier and once again computing the r-points FFT of the resultant data with the fitting data reordering according to equation (3) .
With this proposed design we implemented N=16-points FFT/IFFT [9] , N=64-points FFT/IFFT [10] and N=256-points FFT/IFFT. For instance, the N=64-points FFT/IFFT equation (2) can be expressed as:
This one can be decimated by using Radix2, Radix-4, or Radix-8. The Radix-8 algorithm is an attractive algorithm for its requirement of less complex multiplications and additions/subtractions comparing to Radix-2 and Radix-4 algorithms, however, the use of algorithms with high radix degree increase the complexity of integration in an integrated circuit [11] , [12] .
Even if the number of non-trivial additions and multiplications present a good clue on the effectiveness of an algorithm, hardware integration considers algorithm regularity as well as the conception complexity and architecture control. The gains achieved by the reduction of multiplications or additions could be sometimes lost by the control complexity induced and the interconnection surplus. Hence the interest goes to Radix-2 algorithm that offers more large regularity for architectural hardware implementation compared to Radix-8 and Radix-4 and Split-Radix.
The Radix-2 algorithm is appealing for its simplicity but has the disadvantage for being not adapted for very large points FFT/IFFT calculation, due to the high multiplier requirement. However, to reduce the power consumption in the N-point FFT/IFFT, the number of complex multiplications must be reduced. For this reason, we applied the proposed design methodology for efficient N-points FFT processor based on Radix-2 r-points FFT module in order to keep the area and power consumption as low as possible, meanwhile respecting the restriction of the execution time (Fig. 2) .
The r-points FFT module uses Radix-2 algorithms for computation. After the computation of the initial input data sequences in the r-points FFT, the de-multiplexer holds the data to a serial parallel bloc in the first FFT/IFFT computation or to N-points FFT output in the second computation. In the first case the data are arranged for multiplication in the multiplier unit according to equation (3) .
The arranged data undergoes an operation of inter-dimensional complex multiplication. The multiplier unit should normally perform complex multiplications with N elements. However, in practice a complex multiplication reduction approach leads to a significant reduction of complex multiplications at the cost of far less expensive additions. Indeed, adders require less hardware in FPGA platform and consume much less power than multipliers for the same word-length [13] . Moreover, they have fewer glitches. The implementation complexity of non-trivial twiddle factors reduced even further, due to replacement of the complex multiplications by basic operations, such as shift and adds operations [14] .
B. Multiplicative Comparison
The complex multiplication is an expensive operation in the FFT/IFFT computation. It is the prominent factor that has an effect on the chip area, power consumption, and the throughput of an FFT/IFFT processor intended for an FPGA implementation [11] , [13] . Therefore, with a large number of complex multiplications, direct computation will requires a large chip area and high power consumption. An additional gain in terms of real multiplications can be obtained if we consider the reduction method of complex multiplication [15] . We tend to reduce the multiplicative complexity of the twiddle factor inside the butterfly processor by calculating only three real multiplications and three additions/subtractions operations as demonstrated in equation (5), (6) and (7):
The multiplication with complex twiddle factor:
However this multiplication can be simplified:
and () ZCXY =− (8) In the implementation of the complex multiplication module, the twiddle factors coefficients are known in advance. Fig. 1 . N-points FFT computation using Radix-r algorithm.
Multiplier Unit
W ij N
i.e, C and S in Equations 6 and 7 are pre-computed and stored in a memory table [16] . In this case, it is necessary to store the following three coefficients C, C+S, and C−S. those constants can be saved as canonical signed digits (CSD) to implement complex multiplication with carry and save tree [17] . Thus, the area and power consumption of the complex multiplier can both be reduced. The storage operation is used to simplify the complex multiplication, for instance, the complex multiplication with 4 8 π j N e W − = factor in N-points FFT/IFFT computation requires only two real multiplications rather than three multiplications. Moreover, the complex multiplication can be reduced further with an efficient number representation of fixed-point arithmetic [18] . The implemented method of complex multiplication used in this work uses three multiplications, one addition and two subtractions. This is done at the cost of an additional memory table. In the hardware description language (VHDL) program, the twiddle factor multiplier was performed using component instantiations of three lpm-mult and three lpm-add-sub modules from Altera library. Worth to note that lpm modules are supported by most of EDA vendors and LPM provides an architecture-independent library of modules that are parameterized to achieve scalability and adaptability [19] .
This method allows us to reduce the number of multiplications of FFT/IFFT processor in a dedicated circuit (FPGA), particularly for large point computation (FFT/IFFT 256-points, 1024-points ...). Therefore, we gain enormously in arithmetic complexity, and afterwards, a significant reduction in area and power consumption of FFT/IFFT processor.
The proposed designs of N-points (16-points, 64-points, and 256-points) FFT/IFFTs were coded in VHDL using Altera software "Quartus" and then simulated and synthesized for Altera Cyclone 2 EP2C35F672C6 device. The purpose is to determine the resource usage of the proposed design. The functional and timing synthesis and simulation were performed.
The proposed designs allows three resource reductions; the complex multiplication reduction inside the multiplier unit, combined with the fact of using pipeline architecture. In addition, this structure has eliminated some complex multiplications inside the r-point FFT/IFFT unit.
The proposed designs requires less number of arithmetic operations in terms of real multiplications compared to the conventional Cooley-Tukey algorithm, and to efficient processor implemented in pipeline architecture for Radix-2, Radix-4, Radix-8 and Split-Radix algorithms. A comparison of the proposed design in the case of 16-points and 64-points FFT/IFFT with different FFT/IFFT pipeline architectures are presented in Table I and Table II . Other multiplicative comparison is proposed for developed 256 points FFT/IFFT. We compared the performance, in terms of the numbers of multiplications, of the three designs with efficient Radix pipeline structures with the same word length. Fig. 3 shows the simulated results.
Applying permutations, shift-and-add operations with twiddle factors inside the N-points FFT module reduces more the number of multiplication operations. Multiplications with non-trivial twiddle factors W 16 n , W 64 n , W 256 n were implemented with embedded multiplier 9-bit. The employed logic operations allow us to cut down the number of complex multiplications in the proposed approach (case of 16-points FFT: Proposed processor 2). Therefore, the number of real multiplications was reduced (Table I) . The N-points FFT/IFFT proposed processors requires less number of arithmetic operations compared to the efficient processors implemented in pipeline structure with Radix-2, Radix-2 2 , Radix-4 and Split-Radix algorithm implemented with MDC and SDF architectures. However, the proposed processor needs more resource usage in terms of embedded multiplier in order to attain low multiplicative complexity (Table I, Table II ). Hence, achieve high speed and low power consumption at the expense of losing logic area.
The simulation result shows at Quartus tool that the proposed designs significantly reduce the number of operations within the processor. The proposed modules can be integrated with other components to be used as standalone processor (1024, 2048-points FFT/IFFT) applied for OFDM based Wireless Broadband Communication.
In order to verify the accuracy of computation of the implemented FFT/IFFT cores, we had simulated the calculation FFT/IFFTs in functional simulation mode. The output of the implemented FFT/IFFTs bloc approximately matches the output of FFT/FFTs function written in Matlab representing a theoretical example of FFT/IFFT calculation.
III. CONCLUSION
In this paper, we have presented an efficient design to realize N-point FFT/IFFTs processors for OFDM-based wireless communication systems (IEEE 802.11a, IEEE 802.16a). The number of multiplications has been used in this work as a key metric for comparing FFT/IFFT structures since it has a large impact on the resource usage and power consumption. The simulation result shows that proposed design significantly reduces the number of multiplication operations inside developed FFT/IFFT processors compared to other efficient processors.
