A Low-Power 64-point FFT/IFFT Architecture for Wireless Broadband Communication by Maharatna, Koushik et al.
Thus, the 64-point FFT can be computed by first
taking 8-point FFT of the appropriate data slot
(described in equation (3)) then multiplying them
with 8 intedimensional constants and once again
taking 8-point FFT of the resultant data.
The  IFFT  can  be  performed  by  first
swapping  the  real  and  imaginary  parts  of  the
incoming data and then performing the forward
FFT on them and once again swapping the real
and imaginary parts of the data at the finale. This
methods allows one to perform the IFFT without
changing  any  internal  coefficients  and  thus,
resulting  into  more  efficient  hardware
implementation.
3. Architectural description
The basic architecture of the proposed 64-
point FFT/IFFT module is shown in Figure 1. It
utilizes  two  input  buffers,  one  8-point  FFT
module, an internal buffer and four multipliers.
According to the specification IEEE 802.11a and
ETSI  Bran,  the  FFT  block  receives  the  data
every 4 msec for duration of 3.2 msec. in serial
manner. To satisfy this constraint, we used two
input  buffers.  The  input  data  slots  are  stored
alternately in these buffers and the 8-point FFT
module switches from one buffer to another to
fetch new set of data as soon as the computation
of  64-point  FFT  for  a  particular  data  slot  is
completed.
After  computation  of  first  8-point  FFT  on
the initial input data sequence, the resultant data
undergoes  the  interdimensional  constant
multiplication operation. The multiplied data are
stored  in  an  internal  register  ‘cb’  (shown  in
Figure 1) from where they are rerouted to the 8-
point  FFT  module  in  appropriate  order  to
generate  the  final  result.  The  final  results  are
stored in the buffer cb once again from where the
output is generated in serial manner.
The  input  mechanism,  the  internal
computation  process  and  the  data  output
mechanism are carried out in pipelined fashion.
The parallelism and pipelining introduced in this
architecture  is  favorable  from  the  power
consumption point of view.
To  perform  the  FFT  and  IFFT  using  the
same architecture, we introduce a signal ‘mode’.
The  logic  LOW  state  of  mode  implies  the
forward  FFT  operation  while  its  logic  HIGH
state enables one to perform IFFT operation.
Two  additional  signal  ‘data_valid’  and
‘data_next’ are kept that indicate input valid data
and output valid data respectively. These signals
are  important  from  the  point  of  view  of  the
integration of the  complete  wireless  broadband
communication systems. They indicate valid data
operation condition to the previous and the next
block  of  the  system.  Thus,  this  FFT/IFFT
processor  can  be  utilized  as  the  stand-alone
processor  or  it  can  be  integrated  with  other
required components to form a complete system.
4. Performance of the architecture
From  the  algorithmic  point  of  view,  the
proposed  architecture  requires  less  number  of
arithmetic computations compared to that of the
conventional  Cooley-Tukey  algorithm.  This  is
shown in Table 1.
Algorithm Complex
Multiplication
Addition /
subtraction
Cooley-Tukey
[3]
192 1152
Proposed 49 994
Table 1. Comparison of the number of arithmetic
operations with the Cooley-Tukey algorithm
The above comparison shows that the proposed
architecture  requires  25%  real  multiplication
compared to that of the conventional approach.
In terms of the number of addition, the proposed
architecture requires 86% of those required in the
conventional  approach.  This  results  into
significant reduction of power dissipation.
The architecture is first coded in VHDL and
then simulated using Mentor graphics’ Quicksim
simulator. For convenience, the simulation result
of FFT for a pure cosine function input is shown
in  Figure  2(a).  The  result  of  IFFT  on  the
resulting data is shown in Figure 2(b) that shows
the  functional  correctness  of  the  architecture.
The  architecture  is  synthesized  for  0.25mm
CMOS technology at 20 MHz clock  frequency
using  Synopsis  Design  Analyzer  tool.  The
synthesized  circuit  is  simulated  using  Mentor
graphics’  Modelsim  simulator  that  once  again
exhibits  the  correctness  of  the  structure.  The
synthesis result shows that the area consumption
of the complete FFT structure is 4.9 mm
2 that is
equivalent  to  81.666K  inverter  count  at  that
technology.  At  the  operating  frequency  of  20
MHz  the  power  consumption  of  the  whole
structure  is  78.5169  mW.  At  20  MHz  clock
frequency  the  core  architecture  is  capable  to
compute  64-point  FFT/IFFT  within  2.8msec.
However, with the serial input and serial output
circuitry,  it  completes  the  computation  of  thePre-print of 5
th OFDM workshop
Figure 2(a). The simulation result for FFT on a pure cosine wave input.
Figure 2(b). The simulation result for IFFT on the data of Figure 2(a).