Abstract: This paper presents a new low-power butterfly (BF) unit for single-path delay feedback FFT architectures exploited in multipath FFT processors. In the proposed BF unit, the power consumption is reduced by replacing multiplexors connected to the outputs of adders in a conventional BF unit with AND gates at the inputs of the adders, which is possible by modifying BF operation. In bypass mode by using the AND gates to set an input of each adder to zero, we can reduce the switching activity in the adders and achieve additional reduction in power consumption. The proposed BF unit synthesized with a 0.13 µm CMOS standard cell library achieves an average reduction of 24.8% in power consumption over the conventional BF unit.
Introduction
Fast Fourier transform (FFT) that computes the discrete Fourier transform (DFT) efficiently has been used in a wide range of applications such as ultrawideband (UWB) and wireless personal area network (WPAN) systems based on the orthogonal frequency division multiplexing (OFDM). In previous works various pipeline FFT architectures have been proposed and singlepath delay feedback (SDF) architecture [1] is one of those architectures that achieves the minimum requirement for memory by storing one butterfly (BF) output in a feedback FIFO. The BF operation in N -point radix-2 decimationin-frequency (DIF) FFT algorithms is shown in Fig. 1 (a) , where W nk N is a twiddle factor (TF). To implement the BF operation in Fig. 1 (a) , radix-2 SDF (R2SDF) architecture uses the BF unit shown in Fig. 1 (b) which operates in two modes as follows [2] . On the first N/2 cycles, the BF unit operates in a bypass mode, where two multiplexors (MUXs), M0 and M1, switch to position 0, and bypass the real part, R, and the imaginary part, I, of the input data from left to FIFO until it is filled. At the same time, the other two MUXs, M2 and M3, send the output of FIFO to apply TF multiplication. In the bypass mode, the results of the adders are not used and discarded by MUXs. In a calculation mode during next N/2 cycles, the BF unit computes a 2-point DFT in Eq. The BF unit in Fig. 1 (b) is also used in radix-2 2 SDF (R2 2 SDF) and radix-2 3 SDF (R2 3 SDF) FFT processors [3] . In the first stage of those FFT processors, the BF unit is exploited and in the later stages the BF unit similar to Fig. 1 (b) except for the trivial TF multiplication block is used. For example, the radix-2 2 FFT processor in [2] contains two types of BF units of which the first BF unit is identical to Fig. 1 (b) .
In this paper we propose a new low-power BF unit where we reduce the power consumption by eliminating MUXs in Fig. 1 (b) and placing AND gates at the inputs of adders. In the bypass mode further reduction of power consumption is achieved by exploiting the AND gates since the bypass operation implemented based on the AND gates is a form of operand isolation [4] that minimizes the power overhead incurred by redundant operations by selectively blocking the propagation of switching activity through the circuit. It is an example of operand isolation to reduce switching activity and power consumption in an adder by setting one or more inputs of the adder to zero when the adder generates unused results. It is important to reduce the power consumption in BF units when implementing low-power parallel multi-path FFT processors to provide the high throughput required in applications such as UWB and WPAN systems because the multi-path FFT processors may consist of multiple SDF datapaths exploiting BF units to decrease memory requirements [5, 6, 7] .
Proposed BF unit
The proposed BF in Fig. 2 (a) and adding a selection signal, S, whose value is one. In the bypass mode one of the two inputs of an adder is set to zero by anding with S that is zero and another input is bypassed.
In the proposed BF unit, the power consumption is reduced by replacing the MUXs connected to the outputs of the adders in the conventional BF unit with less power-consuming AND gates at the inputs of the adders as shown in Fig. 2 (a) . During the bypass operation, additional power consumption reduction in adders is achieved since by setting one input of each adder to zero to perform bypass operation, operand isolation is implemented, which reduces the switching activity and the power consumption in the adders. By anding one input of an adder in the proposed BF unit with S, which is zero in the bypass mode, we can set the input to zero and make the adder bypass another input, which means that unlike the adders in the conventional BF unit where high switching activity occurs in the bypass mode because they perform additions whose results are discarded by MUXs, the adders of the proposed BF unit have less switching activity since bypass operations incur lower switching activity than additions. The proposed BF unit where MUXs are replaced with AND gates implements the BF operation modified by interchanging the addition and the subtraction as shown in Fig. 2 (b) , which performs the DFT operation in Eq. (1). In the proposed BF unit the result of addition, x [p] , is shifted into FIFO unlike the conventional BF unit that moves the subtraction result, x[q], into FIFO. Since the addition and the subtraction are interchanged in the proposed BF operation, TF multiplications take place at different locations in the signal flow graph (SFG) of the FFT that exploits the proposed BF operation as can be seen in Fig. 3 , where the example SFGs of two FFTs that use the conventional and the proposed BF operations, respectively, are shown with TFs from W (1) to W (7). In the FFT processor exploiting the proposed BF units, no overhead occurs in TF generation block because the order of TFs is the same as that of the TFs in the conventional FFT processor. Usually DIF FFT result is re-ordered by using a counter and a bit-reverse block. We can reorder the FFT result in Fig. 3 (b) by bit-reversing the inverted value of a counter, which means that we only need additional inverters for re-ordering the result of the FFT based on the proposed BF unit.
Implementation results
We model three 256-point SDF FFT processors whose radices are 2, 2 2 and 2 3 , respectively, with internal word length of 12 bits in Verilog HDL by using the proposed BF units. By simulating the FFT processor models and comparing the results with those of the FFTs based on the conventional BF units, we can verify the function of the proposed BF unit. After the functional verification, the conventional and the proposed BF units are synthesized with 0.13 µm CMOS standard cell library for power consumption comparisons. The synthesis results show that by using smaller AND gates instead of MUXs, the proposed BF unit achieves 13.7% of reduction in area which is estimated in terms of the number of 2-input NAND gates.
The power consumptions of the BF units estimated at various clock rates by using a gate-level power estimation tool are summarized in Table I , where Conventional and Proposed columns show the power consumptions of the conventional BF unit and the proposed BF unit, respectively. The average power consumption reduction of the proposed BF unit is 24.8% compared to the conventional BF unit. Average reduction (%) 24.8
Conclusion
In this paper, we propose a new low-power BF unit for R2SDF, R2 2 SDF and R2 3 SDF FFT architectures. In the proposed BF unit the power consumption is reduced by replacing MUXs with AND gates at the inputs of the adders. The switching activity in adders during the bypass mode is reduced by setting one input of an adder to zero, which results in further reduction of power consumption. The power consumption of the proposed BF unit that is designed in Verilog HDL and synthesized with a 0.13 µm CMOS standard cell library is reduced by 24.8% on average compared to the conventional BF unit.
