A Reconfigurable Butterfly Architecture for Fourier and Fermat Transforms by Al Ghouwayel, Ali et al.
A Reconfigurable Butterfly Architecture for Fourier and
Fermat Transforms
Ali Al Ghouwayel, Yves Loue¨t, Jacques Palicot
To cite this version:
Ali Al Ghouwayel, Yves Loue¨t, Jacques Palicot. A Reconfigurable Butterfly Architecture for
Fourier and Fermat Transforms. WSR’06, 2006, Karlsruhe, Germany. 2006. <hal-00083992>
HAL Id: hal-00083992
https://hal.archives-ouvertes.fr/hal-00083992
Submitted on 5 Jul 2006
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destine´e au de´poˆt et a` la diffusion de documents
scientifiques de niveau recherche, publie´s ou non,
e´manant des e´tablissements d’enseignement et de
recherche franc¸ais ou e´trangers, des laboratoires
publics ou prive´s.
A Reconfigurable Butterfly Architecture for Fourier
and Fermat Transforms
Ali Al Ghouwayel, Yves Louët and Jacques Palicot
Supélec-IETR, Avenue de la Boulaie BP 81127
35511 CESSON-SEVIGNE Cedex, France
Email: {ali.alghouwayel, yves.louet, jacques.palicot}@supelec.fr
Abstract— Reconfiguration is an essential part of Soft-
Ware Radio (SWR) technology. Thanks to this technique,
systems are designed for change in operating mode with
the aim to carry out several types of computations. In
this SWR context, the Fast Fourier Transform (FFT)
operator was defined as a common operator for many
classical telecommunications operations [1]. In this paper
we propose a new architecture for this operator that makes
it a device intended to perform two different transforms.
The first one is the Fast Fourier Transform (FFT) used for
the classical operations in the complex field. The second
one is the Fermat Number Transform (FNT) in the Galois
Field (GF) for channel coding and decoding.
I. INTRODUCTION
SoftWare Radio, appeared in 1990 [2][3], is an every-
growing technology that receives enormous recognition
and generating widespread interest in the telecommuni-
cation industry. Over the last few years, analog radio
systems have been replaced by digital radio systems
for radio applications. In addition to this, programmable
hardware modules are increasingly being used in digital
radio systems at different functional levels. SWR tech-
nology aims to take advantages of these programmable
hardware modules to build an open-architecture based
on radio system softwares. SWR technology facilitates
implementation of some of the functional modules in
a radio system such as modulation/demodulation, cod-
ing/decoding in software. In this optics, we will present
a new architecture for the FFT whose Butterfly is re-
configurable so as to perform two kinds of transforms
over two different fields. The first one is the complex
field (C) on which the FFT carry out some functions
as OFDM modulation, frequential Egalisation... . The
second one is the GF where the FFT will be reconfigured
as an FNT to carry out the coding and two main steps of
the decoding process for Reed-Solomon (RS) codes. The
paper is organized as follows: Section 2 describes the
FNT and their applications. Section 3 investigates some
arithmetic operators to define a reconfigurable Butterfly.
In section 4, we present the global structure of FNT.
Finally, the conclusions are outlined in section V.
II. THE FNT FOR RS CODES OVER GF(Ft)
The Number Theoretic Transform (NTT) has been
introduced as a generalization of the Discrete Fourier
Transform (DFT) over residue class rings of integers in
order to implement fast cyclic convolutions and correla-
tions without round-off errors and with better efficiency
than the FFT [4][5]. Interesting applications of the NTT
lies in fast coding, decoding, long integer multiplica-
tion, cryptography, digital filtering, image processing and
deconvolution. For the transform length equal to Ft,
where Ft = 22
t
+ 1 is the Fermat number, the NTT
is called the Fermat Number Transform (FNT) which
presents some advantages. It is quite obvious, that FNT
is suitable for VLSI implementations. The structure of
the FNT is identical to that of the DFT for power of
two lengths. Then the same algorithms can be used
for the classical radix-2 FFT and the radix-2 FNT. The
only one difference is the substitution of the complex
multiplication in the Fourier transform by a modulo Ft
real multiplication in the case of the FNT. The following
gives the definitions of FFT and FNT.
In C, the Discrete Fourier Transform of
v = (v0, v1, ..., vN−1), a vector of real or complex
numbers, is a vector V = (V0, V1, ..., VN−1), given by
Vk =
N−1∑
i=0
e−j
2piik
N vi k = 0, ..., N− 1 (1)
where j = √−1. The Fourier kernel exp(−j2pi/N) is an
Nth root of unity in the field C. In the finite field GF(q),
an element α of order N is an Nth root of unity. Drawing
on the analogy between exp(−j2pi/N) and α, we have
the following definitions:
Let v = (v0, v1, ..., vN−1) be a vector over GF(q), and
let α be an element of GF(q) of order N. The vector v
and its Discrete Fourier Transform are related by :
Vj =
N−1∑
i=0
αijvi ⇐⇒ vi = 1
N
N−1∑
j=0
α−ijVj ,
(2)
for j = 0, ..., N− 1, where N is interpreted as an integer
of the field. Further details can be found in [6].
Our work is focused on the application of the FNT
to the channel coding-decoding. Indeed, the application
of the Discrete Fourier Transform in the complex field
occurs throughout the subject of signal processing. The
same transform technique can play an important role
in the study and processing of GF(q) valued signals,
q a prime number, that is, of codewords. By using the
Fermat transforms, the principles of coding theory can be
described in a setting that is much closer to the methods
of signal processing. In frequency-domain, cyclic codes
can be defined as codes whose codewords have certain
specified spectral components equal to zero [6]. In [7]
we have presented the advantages of the application
of such a transform to RS codes constructed over the
GF(Ft). In this paper we will describe in details the
practical realization of the FFT operator defined in C and
which can be reconfigured to become the FNT operator
with arithmetic carried out modulo Fermat numbers. This
reconfiguration consists in reconfiguring each Butterfly
of the FFT structure. In the next section we will present
the Butterfly itself as a function which is constituted by
several reconfigurable arithmetic operators.
III. RECONFIGURABLE BUTTERFLY
In the SWR concept, an new area of research called
"Parametrization" has been defined [8][9]. This tech-
nique consists to identify common resources, i.e Com-
mon Operator (CO) or Common Function (CF) between
all the standards involved in the reconfiguration and in
the standards themselves. Then, the trick is to exploit
the same resources to execute two or more applications.
In this context, the main goal of this work is to exploit
the resources already present in the FFT structure to get
the FNT one. With this purpose, the arithmetic operators
i.e multiplier, adder and subtracter realizing operations
over C should be redefined to realize a modulo(Ft) oper-
ations. Then the reconfiguration of the Butterfly (Figure
1) consists to reconfigure the aforementioned arithmetic
elements. Then, one need to define a modulo(Ft) multi-
plier, adder and subtracter.
u
v
x
y×
r
N
W
+
-
Fig. 1. The complex Butterfly
A. Modular Multiplication in GF(Ft)
The modulo 2n + 1 multiplication is widely used in
the computation of convolutions and in Residue Number
Systems (RNS) arithmetic. Several architectures of a
modulo (2n + 1) multiplier based on Ma’s algorithm
[10] and on the modified Low-High algorithm [11] was
described in [12][13]. Indeed, there are two categories of
algorithms for the modulo 2n+1 multiplication. The first
one consists to perform the multiplication and after the
correction [12]. The second one consists in the reduction
of partial products [10][14]. In [12, Table 2] the author
has compared the performances of the different archi-
tectures of Modular multiplier implemented in Virtex-II
and Virtex-E. For Virtex II devices, the "(n+1)*(n+1)"
multiplier allows a significant gain in terms of slices
and a reduced delay compared to the modified Low-
High algorithm and Zimmermann’s algorithm [14]. For
Virtex-E devices, this "(n+1)*(n+1)" multiplier offers the
best compromises area-delay in case of non-pipelined
architecture. On the other hand and for the pipelined
Virtex-E devices, the operator based on Zimmermann’s
algorithm offers the best compromises area-delay. As
previously mentioned, our works lie in the development
of a modular multiplication by exploiting the already
existing resources. Then, our proposed multiplier (Figure
2) is based on the architecture presented in [12, Figure
2-a] with some simplifications. In fact, for n = 2t the
proposed multiplier works in Z2n+1 = {0, a ∈ Z2n+1 |
gcd(a, 2n + 1) = 1} and the one of two operands of
multiplication is the element αi, i = {0, 1...., Ft−1
2
− 1},
α
Ft−1
2 =-1. Then the product that equal Ft − 1 and that
requires (n+1)-bit never occurs. From that simplifications
come. In Figure 2, the white elements indicate the
elements not used in the case of operation over GF. As it
is noticed in this figure, there is a reconfiguration of the
connections inter-operators. The dotted lines connections
represent the additional connections in this operating
mode over GF.
A basic modulo (2n + 1) multiplication algorithm
consists in computing p=xy, and dividing this product
by 2n + 1:
xy mod (2n + 1) = p mod(2n + 1) =
2n−1∑
i=0
pi2
i
Since the division is a hard task, it will be interesting to
use an algorithm to perform the modular reduction. We
define cL and cH the lower and higher words respectively
of the product p as follows:
cL =
n−1∑
i=0
pi2
i and cH =
n−1∑
i=0
pn+i2
i
a
c
-
+
b
d
×
×
×
×
real
imag
1
n+1 bits
n bits
n bits
n bits
n+1 bits
i?
n bits
Fig. 2. The modulo (2n + 1) multiplier
The modulo (2n + 1) operator depicted in Figure 2 is
carried out by :
xy mod (2n + 1)={
(cL + cH + 2) mod 2
n if cL + cH + 1 < 2n
(cL + cH + 1) mod 2
n otherwise
B. Modular Addition in GF(Ft)
Most of algorithms describing the addition modulo
(2n + 1) are performed in the diminished-one number
system, where a number x is represented by x′ = x− 1
and the number 0 is not used or treated as a special
case [13][14][15]. This implies:
(x′ + y′ + 1) mod (2n + 1) ={
x′ + y′ mod 2n if x′ + y′ ≥ 2n
(x′ + y′ + 1) mod 2n if x′ + y′ < 2n
= (x′ + y′ + cout) mod 2
n.
Since the operators performing the complex addition
processes the numbers in normal representation, the best
way to perform a modular addition is to keep the same
architecture to get the reconfigurability at lower costs.
Let us now study the modulo (2n + 1) addition of two
numbers in normal representations. In [13] the author
described some algorithms that return the desired results
increased by one. Nevertheless this property facilitates
the design of the circuit. The modulo (2n + 1) addition
is defined by:
(x + y + 1) mod (2n + 1) ={
2n if x = 2n and y = 2n
(x + y) mod 2n + cout if 0 ≤ x + y < 2n+1
(3)
In [13] a direct implementation of equation (3)
is presented. The circuit is shown in Figure 3-a.
To improve the implementation, the author suggests
an alternative architecture suppressing the multiplexer
Figure 3-b. The modulo (2n+1) addition is expressed as:
(x + y + 1) mod (2n + 1)
= (x + y) mod 2n+sn+12
n+ sn+1 ∨ sn
To perform an addition that returns directly the
desired result, we propose an alternative adder shown
in Figure 4. We define s1, s2 the sums at the first and
second adders respectively with the (n+2)-bit integer
s1 = [s1n+1s
1
n...s0] = x + y. The modulo (2n + 1)
addition can be expressed as:
(x + y) mod (2n + 1)=

(x + y) mod 2n if 0 ≤ x + y < 2n
(x + y) mod 2n + 2n − 1 if 2n < x + y ≤ 2n+1
2n if (x = 2n and y=0)
or (x=0 and y = 2n
(4)
in other words:
(x + y) mod (2n + 1) = s2ns2 + s
2
n2
n
n bits
n+1 bits
n+1 bits
(x+y+1)mod (2n +1)
x y
n+1 bits
(b)
n bits
n+1 bits
x y
n+1 bits
n+1 bits
(x+y+1)mod (2n +1)
2n
1 0
(a)
M
o
st
si
g
n
i f
ic
a n
t
b
i t
Fig. 3. The two architectures of the mod (2n + 1) Adder [13]
n+1 bits
(x+y)mod (2n +1)
x y
n+1 bits
2n
1 0
n bits
n bits
1
1,...,0 ?ns
2
1,...,0 ?ns
Fig. 4. The proposed mod (2n + 1) Adder
Now, let us demonstrate the correctness of equation
(4). First of all, let us consider x and y two elements of
GF (Ft), 0 ≤ x, y ≤ 2n. Then,
0 ≤ x + y ≤ 2n+1
We have to distinguish the four following cases to
establish the correctness of our algorithm:
• For x + y = 2n+1 (i.e. x = y = 2n),
we have s1 = 2n+1 (i.e. s1n+1 = 1, s1i = 0 for
i = 0, ..., n). Consequently s2 = 0 + 2n − 1, s2n = 0,
and our algorithm returns 2n − 1.
• For x + y = 2n (i.e. x = 0 and y = 2n or x = 2n
and y = 0), we have:
s1n = 1, s
1
n+1 = 0 and (s1n ∨ s1n+1 = 1),
s2 = 2n + 2n − 1 = 2n+1 − 1,
In this case s2n = 1 and the multiplexer selects
2n as result. This is the only case where s2n = 1.
• For 2n < x + y < 2n+1, we have:
s1 = 0.2n+1 + s1n2
n + ... + s10,
or
2n mod (2n + 1) = (−1 + 2n + 1) mod (2n + 1)
= (−1) mod (2n + 1),
The second adder of Figure 4 returns an addition
mod 2n, then (−1) mod 2n = 2n − 1.
Consequently,
2n + 2n − 1 < s2 = x + y + 2n − 1 < 2n+1 + 2n − 1,
2n+1 ≤ s2 < 3 ∗ 2n − 1,
what give s2n = 0. Then our algorithm return
(x + y + 2n − 1) mod 2n.
• Finally, for 0 ≤ x + y < 2n, we have:
s1n+1 = s
1
n = s
2
n = 0,
and (x + y) mod 2n+1 = x + y.
As known, the arithmetic subtracter is usually based
on the arithmetic adder structure. For the modulo
(2n + 1) subtracter, we propose an operator shown
in Figure 5. The subtraction modulo (2n + 1) can be
expressed as follows:
(x− y) mod (2n + 1)={
2n if (x = 2n and y = 0)
(x + y + 1 + sn) mod 2
n otherwise
(5)
A proof of the correctness of this algorithm is provided
in Annex A. Once the different elements of the Butterfly
are defined, one can implement them to obtain the
reconfigurable Butterfly. Figure 6 depicts the resulting
hardware operator. The switch from an operating mode
to another requires a change of the Fourier kernel and the
reconfiguration of connection inter-operators. Assuming
that the Butterfly is configured to operate over C and
one wants to perform a calculation over GF (Ft). To
do this, the Butterfly should download the primitive
element αi, activate the different logic gate (AND,OR
and the multiplexers) and reconfigure the connection
inter-operators as shown in Figure 6. In the next section,
the global architecture of the FNT is presented.
n bits
n+1 bits
(x-y)mod (2n +1)
x y
n+1 bits
2n
1 0
1
1,...,0 ?ns
Fig. 5. The proposed mod (2n + 1) Subtracter
1
0
12n
s
n
s
n+1
2
n 1
01
s
m?
ai
bj
m?
ai + bj
ai - bj
m?
m?
ai
bj
Fig. 6. The architecture of the Butterfly over GF(Ft)
IV. THE FNT ARCHITECTURE
In the previous sections, we have presented the recon-
figuration at a rather low level. The Butterfly constitutes
a high parameterized function level. The fact to have this
parameterized function allows to design a reconfigurable
operator whose Butterfly forms the highest level opera-
tor. Figure 7 depicts the global reconfigurable operator.
Over C it is called FFT and over GF(Ft) is called
FNT. This architecture has been validated by software.
A simple test of calculation of FFT and IFFT, showed
the validity of this structure.
s
8
s
0
s
12
s
4
s
10
s
2
s
14
s
6
s
9
s
1
s
13
s
5
s
11
s
3
s
15
s
7
6?
6?
S
0
S
1
S
2
S
3
S
4
S
5
S
6
S
7
S
8
S
9
S
10
S
11
S
12
S
13
S
14
S
15
4th stage1th stage …
2?
4?
5?
7?
1?
2?
0?
3?
4?
0?
6?
4?
2?
0?
4?
0?
0?
4?
4?
4?
0?
0?
0?
0?
0?
0?
0?
0?
0?
0?
Fig. 7. The architecture of the FNT operator
V. CONCLUSIONS
A new arithmetic modular operators has been defined
to build a reconfigurable Butterfly. Once the reconfig-
urable Butterfly has been identified, the FFT common
operator is obtained. This operator is dedicated to be a
reconfigurable operator that operates over C to perform
the Fourier transform (i.e. OFDM modulation,...) and
over GF(Ft) to perform RS channel coding and two
main steps of the decoding process in computing the
Fermat transform. As a current work, the objective is
to implementing this operator with FPGAs to test its
reconfigurability and the performances in term of slices
and delay.
A PROOF OF THE NEW MODULO (2n + 1)
SUBTRACTER ALGORITHM
Let us demonstrate that the algorithm defined by
Equation (5) carries out (x − y) mod (2n + 1) when
0 ≤ x, y ≤ 2n. First of all, let us note that x, y are
(n+1)-bit length with 0 ≤ x + y ≤ 2n+1.
We have: 2n − 1 ≤ y ≤ 2n+1 − 1,
that gives,
2n − 1 + 1 ≤ x− y = x + y + 1 ≤ 2n + 2n+1 − 1 + 1
2n ≤ x + y + 1 ≤ 3 ∗ 2n
We have to distinguish the three following cases to
establish the correctness of our algorithm:
• if x ≥ y =⇒ x + y + 1 ≥ 2n+1, sn+1 = 1, sn = 0
and the algorithm returns x + y + 1
• if x ≤ y =⇒ x + y + 1 < 2n+1, sn+1 = 0, sn = 1
and the algorithm returns x + y + 1 + 1
• if (x = 2n and y = 0) =⇒ sn+1 = sn = 1
and the algorithm returns 2n.
REFERENCES
[1] J. Palicot, C. Roland, FFT: a basic Function for a Reconfigurable
Receiver, ICT’2003, Feb. 2003, Papeete, Tahiti.
[2] J. Mitola, The software Radio Architecture, IEEE Communica-
tions Magazine, May 95, pp. 26-38.
[3] W. Tuttlebee, Evolution of radio systems into the 21st cen-
tury, Proc.IEEInt. Conf. on ’Radio receivers an associated
systems’,1995.
[4] M.A. Sonderstrand et al., Reisdue Number System Arithmetic:
Modern Applications in Digital Signal Processing, New York:
IEEE Press, 1986.
[5] M.A. Bayoumi,G.A. Julien, and W.C. Miller, A Look Up Table
VLSI Design Methodology for RNS Structures Used in DSP
Applications, IEEE Trans. Circuits and systems, vol.34, pp. 604-
616, June 1987.
[6] Richard E. Blahut, Algebraic Codes for Data Transmission,
Cambridge University press, 2001.
[7] A. Al Ghouwayel, Y. Louët and J. Palicot, A Reconfigrable
Architecture for the FFT Operator in a Software Rdio Context,
IEEE ISCAS’2006,Greece, May 21-24, 2006.
[8] W. Tuttlebee, software defined radio: Enabling technologies,
Wiley, 2003.
[9] J. Palicot, D. Giri, C Moy, A Theoretical Approach of Para-
meterization Design for SDR Systems, Workshop on Software
Defined Radio : theory, design and applications, ESSIRC 2005
- Grenoble -France.
[10] Y. Ma, A Simplified Architecture for Modulo (2n + 1) Mul-
tiplication, IEEE Transactions on Computers, 47(3) :333-337,
1998.
[11] X. Lai, On the Design and Security of Block Ciphers, ETH Se-
ries in Information Processing. Hartung-Gorre Verlag Konstanz,
1992.
[12] Jean-Luc Beuchat, Modular Multiplication for FPGA Im-
plementation of the IDEA Block Cipher, Proceedings of
the Application-Specific Systems, Architectures, and Processors
(ASAP’03).
[13] Jean-Luc Beuchat, Some Modular Adders and Multipliers for
Field Programmable Gate Arrays, Proceedings of the 17th
International Parallel and Distributed Processing Symposium.
IEEE Computer Society,2003.
[14] R. Zimmerman, Efficient VLSI Implementation of Modulo (2n±
1) Addition and Multiplication, In Proceedings of the 14th IEEE
Symposium of Computer Arithmetic, pages 158-167.
[15] Haridimos T. Vergos, Costas Efstathiou, and Dimitris Nikolos,
Diminished-One Modulo 2n + 1 Adder Design, IEEE Transac-
tions on Computers, Vol. 51, No. 12, Dec. 2002.
