In this paper, we analyze an implementation loss of a fast Fourier transform algorithm according to a bit truncation process. A 4K-point FFT algorithm is proposed and implemented in field programmable gate arrays. Because bit cannot be extended indefinitely, the truncation is required and performance varies depending on the truncation method in the implementation of the FFT algorithm. We measure the implementation loss according to a bit truncation process; the measured results show a better performance of truncation after FFT output.
Introduction
An orthogonal frequency-division multiplexing (OFDM) communication system is the structure generally used by many wired and wireless communications. A fast Fourier transform (FFT) algorithm is categorized by the way Cooley-Tukey recursive decomposition is applied. These decompositions finally reach butterfly operations, which greatly influences the FFT architecture. In an OFDM system, the FFT algorithm is very important, its performance being a crucial part of this system [1] .
The study discussed in this paper used various fast algorithms for discrete Fourier transform (DFT) implementation, such as Cooley-Tukey, and DFT was implemented according to the Cooley-Tukey algorithm. Using this algorithm, the architecture is modular and the 4K-point selection mode is implemented [2] .
In this paper, we analyze the implementation loss of the FFT algorithm according to a bit truncation process. The first method involves the way to handle the truncation within stage. Truncation is processed by LSB truncation in the previous six stages and by MSB truncation in the next six stages. The second method involves the way to handle the truncation after FFT. We implement a FFT algorithm with field programmable gate array (FPGA) as a prototype. We also measure the implementation loss after bit truncation in a 4K-point FFT algorithm [3] .
Structure of 4K-point FFT Algorithm including Truncation Block
The N -point discrete Fourier transform is defined by
where X(k) and x(n) are assumed to be complex numbers and W N is the principal N th root of unity. Figure 2 shows a comparison of the stage truncation and full truncation results using the same input data after IFFT/FFT processing. If there is no quantization error due to bit truncation, the input and output must be exactly the same, but their errors are not exactly the same as the result shown in Figure stages, each of which operates in a radix-2 butterfly structure [4] . The processing capacity is reduced by half at all stages, and the first-stage processing capability and the second-stage processing capability are 4K and 2K, respectively, and so on [5] . Since all the stages contain truncation blocks, the output of all the stages is 20 bits. As shown in the Figure 3 , since only one symbol is processed in the 12th stage, the multiplication block is eliminated [6] , [7] .
As previously mentioned, all stages are included in the truncation block in the bit truncation of a 4K-point FFT algorithm. The truncation block is used to control the number of bits processed at every step. The number of bits in the truncation block is controlled because the processing bits cannot be used indefinitely in a hardware implementation [8] . Based on the bit truncation operation, the performance of an FFT algorithm changes; thus, in the case of the MSB truncation method, the input and output are 21 and 20 bits, respectively, because one MSB bit operates for cancellation. In the case of the LSB truncation method, the input and output are 21 and 20 bits, respectively, because one LSB bit operates for removal [9] , [10] .
3 Structure of FFT Algorithm by Truncation after FFT Figure 4 shows the implemented stage structure of a 4K-point FFT algorithm by truncation after the FFT. The truncation is processed after the FFT output, and all the bits are incremented by one at every step. As shown in Figure 4 , one data symbol is processed in the twelfth stage, so the multiplication block is removed. The truncation is also processed after the FFT operation; therefore, the truncation block is also removed from the stage block. The truncation process is used to control the number of bits processed in the bit truncation of a 4K-point FFT algorithm. Because the processing bits cannot be used indefinitely in a hardware implementation, the number of bits is controlled after the FFT operation. The bit truncation operation changes the performance of an FFT algorithm. This approach works by selecting only 20 bits of the full 32 bits of validity. Figure 5 shows the implementation loss of a 4K-point FFT output due to the truncation position. Stage truncation means that the truncation is done at every step and that full truncation is performed after every step. Implementation loss is the difference between the input data and the output data after IFFT/FFT processing.
Conclusions
In this paper, we present an effcient design and implementation of a 4K-point FFT algorithm. The bit truncation process is adjusted to optimize hardware performance. The truncation process is used to control the number of processing bits in the bit truncation of the 4K-point FFT algorithm. The processing bit cant be used indefinitely for hardware implementation. Because of that, it is controlled the number of bits after FFT operation. The implemented 4K-point FFT algorithm was tested in a laboratory environment, and the test results showed the best performance of the truncation after the FFT output.
