Abstract --The prevalent need for very high speed digital signals processing in wireless communications has driven the communications system to high performance levels. The objective of this paper is to propose a novel structure for efficient implementation for the Fast Fourier Transform (FFT) processor to meet the requirement for high speed wireless communication system standards. Based on the algorithm, architecture analysis, the design of pipeline Radix 2 2 SDF FFT processor based on digit-slicing Multiplier-Less is proposed. Furthermore, this paper proposed an optimal constant multiplication arithmetic design to multiply a fixed point input selectively by one of the several present twiddle factor constants. The proposed architecture was simulated using MATLAB software and the Field Programmable Gate Array (FPGA) Virtex 4 was targeted to synthesis the proposed architecture. The design was tested in real hardware of TLA5201 logic analyzer and the ISE synthesis report results the high speed of 772.966 MHz with the total equivalent gate count of 14,854. Meanwhile, It can be found as significant improvement over Radix 2 2 DIF SDF FFT processor and can be concluded that the proposed pipeline Radix 2 2 DIF SDF FFT processor based on digit-slicing multiplier-less is an enable in solving problems that affect the most high speed wireless communication systems capability in FFT and possesses huge potentials for future related works and research areas.
INTRODUCTION
FFT plays an important role in many digital signals processing (DSP) application such as communication systems and image processing. It is an efficient algorithm to compute the discrete Fourier transform (DFT) and it's inverse. The DFT is main and important procedure in the data analysis, system design and implementation [1] . The challenge in FFT hardware implementation is the speed functionality of the multiplier unit. Hence, to reduce the complexity of the FFT calculation, many modules were developed [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] . However, in order to implement FFT processor as system on chip (SOC) ASIC implementation and FPGA prototyping were considered. Recently, FPGA has become an applicable option to direct hardware solution performance in the real time application. However, this paper will concentrate on FPGA implementation of high multiplier-less FFT processor using shift add technique. Since multiplication causes high delay propagation in FFT calculation, the new technique of digitalslicing is applied to build novel architecture of multiplier-less FFT processor. The motivation of this research work was inspired by [12] [13] [14] 18] . Meanwhile, the study of the digitslicing FFT has been introduced by [15] in DSP application. Hence, this research will use a similar digit-slicing technique with the ones put forth by [15] but having a difference by the use of a different algorithm, architecture and different platform, which helps to improve the performance and achieve higher speed and performance.
II. DIGIT SLICING ARCHITECTURE
The concept behind the digit-slicing is any complex number, F, can be sliced into smaller blocks, each having a shorter word length, p, as shown in the following equations [14] .
Where FI k,i and FR k,I have values which are either zero or one. Any value whose absolute value is less than one can be represented in two's complement as:
Where x is any number which its absolute value is less than one and x is sliced into b blocks of each p bits wide.
Where X k,j are all either ones or zeros except for X k=b-1, j=p-1 which is either zero or minus one.
In Eq. 6 the index n and k decomposed as:
The total value of n and k is N. When the above substations are applied to (6) the DFT definition can be written as the: 
For normal Radix-2 DIF FFT algorithm, the expression in the braces is computed first as a first stag in (10) . However, in Radix 2 2 FFT algorithm, the idea is to reconstruct the first stage and the second stage twiddle factors [16] .
Observe that the last twiddle factor in (12) can be rewritten as:
By applying (12) , (13) and (10) and expand the summation over n 2 , the result is a DFT definition with four times shorter FFT length.
Where, x [1] x [2] x [5] x [3] x [4] x [7] x [6] X[0]
X [4] X [2] X [5] X [6] X [1] X [7] X [3] 
stage which is normally the Butterfly II. The control signal C1 has two options C1=0 to multiplexers direct the input data to the feedback registers until they filled. The other option is C1=1 the multiplexers select the output of the adders and subtracters. Fig. 6 shows the Butterfly II structure. The input data B r , B i comes from the previous component, Butterfly I. The output data from the Butterfly II are E r , E i , F r and F i . E r , E i fed to the next component, normally twiddle factor multiplier. The F r and F i go to the feedback registers. The multiplication by -j involves swapping between real part and imaginary part and sign inversion. The swapping is handled by the multiplexers Swap-MUX efficiently and the sign inversion is handled by switching between the adding and the subtracting operations by mean of Swap-MUX. The control signals C1 and C2 will be one when there is a need for multiplication by −j, therefore the real and imaginary data will swap and the adding and subtracting operations will switched. In order to not lose any precision the divide by 2 is used where the word lengths imply successive growth as the data goes through adder, subtracter and multiplier operations. Rounding off has been also applied to reduce the scaling errors. 
B. Butterfly II Structure

C. Digit Slicing Complex Multiplier Less
Complex multiplier can be realized by digit-slicing multiplierless and real adder [17] based on (16) as shown in Fig. 7.   (a r +ja i )(b r +jb i )={b r [a r -a i ]+a i [b r -b i ]}+j{b i (a r +a i )+a i (b r -b The proposed design slicing the input data to four blocks each block carry four bits. by considering the input data for the multiplier are A and B with the word-length of 16 bits two's complement fixed point signed number with 15 bits fraction. The digit slicing architecture applied for the input A as shown in Fig 8. There are four different cases for the multiplication between the four bits and the twiddle factors. Fig. 8 shows the block diagram of the digit-slicing multiplier less using shift and addition technique.
Because of the shifts operation according to the digit slicing algorithm the twiddle factors will store with right shifts by 6 which means that the ROM for store the twiddle factors will be 10 bits width only not 16 bits. As mentioned in (4) and (5) the digit-slicing algorithm for this case will be: The proposed design of pipeline Radix 2 2 DIF SDF-FFT processor based on digit-slicing multiplier-less has been implemented using Matlab to prove and check the result for all stages as shown in Fig. 9 . The design has been coded in Verilog HDL and tested in real hardware using Xilinx Virtex-4 FPGA as shown in Fig. 10 and Fig. 12 . In addition, the Modelsim XE-III was used to get the simulation result of the proposed design as shown in Fig. 11 . 
VI. CONCLUSION
This study presented the FPGA Implementation of pipeline Radix 2 2 DIF SDF-FFT processor based on digit-slicing multiplier-less. The implementation has been coded in Verilog HDL and was tested on Xilinx Virtex-4 FPGA prototyping board. A maximum clock frequency of 772.966 MHz with total equivalent gate count of 14,854 have been obtained from the synthesis report for the 8 point pipeline Radix 2 2 DIF SDF FFT which is 3.35 time faster than the conventional butterfly and it required 20% of the conventional butterfly area. It can be concluded that the proposed design is an enabler in solving problems that affect communications capability in FFT and possesses huge potentials for future related research areas. 
