Abstract-The rapid grown in wireless 4G and 5G technology push to the edge to high input data processing. High input data processing required advance Orthogonal Frequency Division Multiplexing (OFDM). The main block in any OFDM transceiver is the Fast Fourier Transform (FFT). FFT consider the transformation bridge between the time and frequency domains. In this research an implementation and direct analysis between radix-2 and radix-4 FFT algorithms presented. Memorybased architecture adopted for the all algorithms. The entire algorithm designed by Altera Quartus II and synthesis for Altera DE2-70 field programmable gate arrays (FPGA) board, in order to investigate and determine the desired algorithm based on the application used for and the system requirement.
I. INTRODUCTION
The Information was mainly delivered through analog system in the past. However, advancement in digital signal processing has confirmed that many advantages regarding cost and performance is offered by technology over analog solutions. Discrete Fourier Transform (DFT) plays an important role in Orthogonal Frequency Division Multiplexing (OFDM) based communication systems in modern digital signal processing (DSP) and telecommunication [1] . The DFT is often used in linear filtering. A broad range of applications including video broadcasting, digital audio, quantum mechanism [2] , image reconstruction [3] , asymmetric digital subscriber loop (ADSL) have linear filtering. Rather unaffordable computation is needed by even the finite DFT signal to be completed. Specifically, N2 complex multiplications are required for DFT direct calculation of an input signal of length N. according to Colley and Tukey, fast Fourier transform (FFT) will be calculated in O (logrN) operations. In this operations, N refers to the length of the transform and r shows FFT decomposing radix [4] . In the field of Digital Signal Processing, FFT is regarded as a popular algorithm and is operated widely in digital communication particularly OFDM systems [5] . The original contribution of Cooley and Tukey received considered attention and a large number of researchers attempt to extend and enhance their original work.
In many telecommunication systems and digital signal processing, FFT has become a key component. In OFDM, higher orders FFTs are needed for the purpose of increasing transmission efficiency. According to measuring storage system, higher data start with kilobyte (KB) equal to 1,024 bytes. The desire of consumer for high speed untethered access to multimedia together with entertainment services in recent years, has inspired and lead to wideband wireless communication system's growth [6] . In physical layer of this standard, high throughput FFT/IFFT (Inverse fast Fourier transform) is considered as one of the main components which is indispensable for frequency-domain equalization together with orthogonal frequency division multiplexing modulation [7] . Different applications including video broadcasting, OFDM systems, speech processing, WLAN and image processing need high throughput FFT .
On the other hand, it is greatly used in the DSP field because of FPGA technology. Its architecture is flexible. It can be configured based on the project need of user to calculate definite algorithms skillfully. It is the best choice for FFT processors due to its low cost together with high capacity and performance. Traditionally, DSP chips or application Specific integrated circuits were utilized for design solutions. Now a days, a great number of scholars shift to Field programmable Gate Arrays (FPGA).
FPGA has different advantages over other technologies. They have the processing power for handling high speed DSP. The capability of FPGA in performing repetitive operations in parallel propose a performance benefit to the instruction driven, DSP chips' sequential processing. They are a proper alternative to ASIC (Application-specific integrated circuit ) due to their low cost design cycles in comparison with ASIC which need large financial investments to be produced and updated. Moreover, unprecedented design flexibility is provided by them. Designers are provided with a platform in which they are able to assess their design decisions in terms of power, size and throughput through their programmability [8] . Moreover, this feature enables then to get the most effective solution based on the system needs. The performance of FPGA is pushed upward because of advancement in the technology. Different improvements including implanted multipliers and RAM (Random-access memory) logic has led to the simplification of the hardware implementation of DSP algorithm, which allows the digital information's transfer and transport.
II. LITERATURE REVIEW

A. The radix-2 FFT algorithm
FFT of any data sequence in which Length N is a power of two can be computed by the radix-2 efficiently. Other radices lead to decrease in the total number of operation. However, they increase the complexity and decrease the flexibility [4] .
The idea behind the algorithm is to execute the FFT of the even-indexed numbers, Eq. (1), and the odd-indexed numbers, Eq. (2), independently together with combining the results to make the FFT for all the sequence, Eq. (3) and (4).
, where
, for
Executing the radix-2 FFT can be sum up as a radix-2 butterfly computations' sequence. There are two inputs and two outputs for a radix-2 butterfly. The relationship of them is based on which version of the radix-2 FFT algorithm is utilized. Decimations in Time (DIT) together with Decimations in Frequency (DIF) are the two forms of the radix-2 FFT algorithm.
Radix-2 FFT algorithm for N=8 is shown in Figure (1) .A significant property of the radix-2algorithm is also shown in this figure. Moreover, the outputs in the figure are not arranged regularly whereas they are arranged in bit-reversed order.
B. The radix-4 FFT algorithm
The FFT of any data sequence in which length N is a power of four can be computed efficiently using the radix-4. More operation is needed by a radix-4 butterfly for a total of 12 complex addition and 3 complex multiplications than a radix-2butterfly for a total of 2 complex additions and 1 complex multiplication. The radix-4 algorithm needs 75% fewer butterfly computations in spite of this increase in butterfly complexity. The radix-4 algorithm would need 3072 complex additions and 768 complex multiplications for N=256 in comparison with a radix-2 algorithm which needs 2048 complex additions and 1024 complex multiplications [1] . Figure ( 2) the radix-4 FFT algorithm for N=16 is shown. In this algorithm, like radix-2 algorithm, the outputs are arranged in bit-reversed order but bits are reversed in groups of two.
It is worth mentioning that the radix-4 butterfly unit requires trivial multiplications, which include real-imaginary swapping together with sign version. It does not require any additional complex multiplications. When the FFT algorithm is decomposed with a higher radix, it will lead to increasingly complex multiplications in the structure of the butterfly. As such, the radix-4 based butterfly structure is more cost-efficient than higher-radix based butterfly structures, which will decrease stages' number but increase stage's hardware complexity significantly [7] .
III. FFT DESIGN AND IMPLEMENTATION
In this section describe the algorithms and methods used in this work. The radix-2 FFT, the radix-4 FFT along with architecture are described here.
A. FFT Algorithms
This section will portray the design processes for the FFT processors up to 4096-points Radix-2, radix-4, and radix-8 on Altera Quartus II, and choose the suitable Altera FPGA Resources to be used.
All the FFT algorithms designs share in common a memory architecture, Pre-computed twiddle factors stored in 
B. FFT Architecture
The designed FFT algorithms are all implemented on Altera DE2-70 FPGA. Dual memory, memory-based architecture is used with one butterfly unit calculation at a time. The Finite state machine starts with the idle state, waiting for a reset signal. When the system is reset, computation starts using one memory as an input, and the other as an output. Twiddle factor values are preloaded into a ROM and are read along with the butterfly input. After finishing a stage, the two memories swap their roles using the output of the previous stage as an input to the next stage. After finishing all stages, FSM goes to read mode in order to read the output data corresponding to the given address. The system operates at 50 MHz frequency. Figure (2) shows the system illustration.
IV. EXPERIMENTAL RESULTS
In this section, the amounts of power consumption for radix-2 and radix-4 algorithms are measured. The values are compared with the hardware usage for each algorithm. The results for 4096 point FFTs are measured and discussed.
A. The 4096-point radix-2 FFT
The For 4096-point radix-2 FFT, the Quartus II Powerplay power Analyzer shows that the amount of Power that the algorithm consume, and the focus on the (Dynamic Thermal Power Dissipation) where Dynamic power dissipation is results of signal transitions in the circuit. Operating with higher frequency lead to more recurrent signal transitions and outcomes in raised power dissipation [9] . Which its (36.47) mW, represent the amount of the power consumption for the algorithm in working status. Meanwhile the hardware resources used for the 4096-point radix-2, results are (395) logic elements, (274) registers, (622) kilo bites of memory and only (8) multipliers. Figure  (1) shows the Altera Quartus II results summary, the amount of area overhead used to implement the system and the memory size.
B. The 4096-point radix-4 FFT
For the 4096-point radix-4 FFT, the Quartus II Powerplay power Analyzer shows that the amount of the dynamic thermal power dissipation is (50.72) mW, which it represents the amount of the power consumption measured, and the implementation show that it needs (815) logic elements, (561) registers, (557) kilo bites of memory and (24) multipliers.
V. DISCUSSION
In this section, the amounts of power consumption and hardware usage for radix-2 and radix-4 algorithms are measured and compared. The result of 4096 points radix-4 FFT show that it required 58% of the power required by the 4069 points radix-2 FFT. Likewise, the result of 4096 points radix-4 FFT show 67% increment in term of hardware usage. Performance analysis of these two FFT is given in figures (1) and (2), table (1) shows Hardware usage utilization comparison with Altera MegaCore function : 
