Abstract A parallel and pipelined Fast Fourier Transform (FFT) 
INTRODUCTION
A fast Fourier transform (FFT) is an algorithm to compute the discrete Fourier transform (DFT) and it's inverse. A Fourier transform converts time (or space) to frequency and vice versa; FFT rapidly computes such transformations. As a result, fast Fourier transforms are widely used for many applications in engineering, science, and mathematics. The basic ideas were popularized in 1965, but some FFTs had been previously known as early as 1805. Fast Fourier transforms have been described as "the most important numerical algorithm of our lifetime".
There are many different FFT algorithms involving a wide range of mathematics, from simple complex-number arithmetic to group theory and number theory; this article gives an overview of the available techniques and some of their general properties, while the specific algorithms are described in subsidiary articles linked below.
The discrete Fourier transform is obtained by decomposing a sequence of values into components of different frequencies. This operation is useful in many fields (see DFT for properties and applications of the transform) but computing it directly from the definition is often too slow to be practical. An FFT is a way to compute the same result more quickly: computing the DFT of N points in the naive way, using the definition, takes O(N 2 ) arithmetical operations, while a FFT can compute the same DFT in only O(N log N) operations. The difference in speed can be enormous, especially for long data sets where N may be in the thousands or millions. In practice, the computation time can be reduced by several orders of magnitude in such cases, and the improvement is roughly proportional to N / log(N). This huge improvement made the calculation of the DFT practical; FFTs are of great importance to a wide variety of applications, from digital signal processing and solving partial differential equations to algorithms for quick multiplication of large integers.
The best-known FFT [2] ) algorithm). However, it turns out that by cleverly re-arranging these operations, one can optimize the algorithm down to O(N log(N)), which for large N makes a huge difference. The optimized version of the algorithm is called the fast Fourier transform, or the FFT.
The standard strategy to speed up an algorithm is to divide and conquer. We have to find some way to group the terms in the equation
s see what happens when we separate odd ns from even ns (from now on, let's assume that N is even):
Where we have used one crucial identity:
Notice an interesting thing: the two sums are nothing else but N/2-point Fourier transforms of, respectively, the even subset and the odd subset of samples. Terms with k greater or equal N/2 can be reduced using another identity:
If we start with N that is a power of 2, we can apply this subdivision recursively until we get down to 2-point transforms. We can also go backwards, starting with the 2-point transform:
The two components are:
We can represent the two equations for the components of the 2-point transform graphically using the, so called, butterfly Furthermore, using the divide and conquer strategy, a 4-point transform can be reduced to two 2-point transforms: one for even elements, one for odd elements. The odd one will be multiplied by W 4 k Diagrammatically. This can be represented as two levels of butterflies. Notice that using the identity W N/2 n = W N 2n , we can always express all the multipliers as powers of the same W N (in this case we choose N=4).
RADIX-4 FFT
The decimation-in-time (DIT) radix-4 FFT recursively partitions a DFT [5] into four quarter-length DFTs of groups of every fourth time sample. The outputs of these shorter FFTs are reused to compute many outputs, thus greatly reducing the total computational cost. The radix-4 decimation-in-frequency FFT groups every fourth output sample into shorter-length DFTs to save computations. The radix-4 FFTs require only 75% as many complex multiplies [6] Figure 1 graphically illustrates this form of the DFT computation. It is this reuse that gives the radix-4 FFT its efficiency. The computations involved with each group of four frequency samples constitute the radix-4 butterfly, which is shown in Figure 2 . Through further rearrangement, it can be shown that this computation can be simplified to three twiddlefactor multiplies and a length-4 DFT! The theory of multi-dimensional index maps shows that this must be the case, and that FFTs of any factorable length may consist of successive stages of shorter-length FFT [25] with twiddle-factor multiplications in between.
FPGA Implementation
The Field Programmable Gate Array is majorly used for generation ASIC IC's to the computations. They offer more speed in execution process. SO, for generation ASIC IC's FPGA's [21] are majorly used. The 64 FFT with radix 4 is simulated and synthesized as well as implemented on the FPGA of below configuration. 
Simulation Results:
The RTL view of the butterfly structure obtained after the simulation of the 256-point FFT block, Decimation in time domain is shown next and also the internal architecture of the butterfly block is shown. Hardware implementation was through system C coding and its results are as follows 
Conclusion
In this project it is shown that a baseband ASIC can be fast and at the same time flexible with a low power consumption. The term fast do not refer to extreme clock frequencies but to the fact that no part of the designs needs more than one clock cycle to process a sample once the pipe is filled. Hence, the design does not need to be clocked any faster than the requested bandwidth and compared to modern CMOS technology this is a low number, in the order of 5-100 MHz There are several advantages with a low clock frequency, firs it is possible use a low power/low speed cell library with low static leakage current and secondly, it is easier to create a clock tree. Since flexibility is achieved with independent modules, where the operation mode decides if a module should be used or not. The unused modules are not clocked and hence only consume static leakage power and when you implement 256 point FFT in FPGA than better results are out.
