Low-Complexity Versatile Finite Field Multiplier in Normal Basis by unknown
EURASIP Journal on Applied Signal Processing 2002:9, 954–960
c© 2002 Hindawi Publishing Corporation
Low-Complexity Versatile Finite Field Multiplier
in Normal Basis
Hua Li
Department of Mathematics and Computer Science, University of Lethbridge, Lethbridge, Alberta, Canada T1K 3M4
Email: huali@cs.uleth.ca
Chang Nian Zhang
Department of Computer Science, TRLabs, University of Regina, Regina, SK, Canada S4S 0A2
Email: zhang@cs.uregina.ca
Received 6 August 2001 and in revised form 30 August 2002
A low-complexity VLSI array of versatile multiplier in normal basis over GF(2n) is presented. The finite field parameters can be
changed according to the user’s requirement andmake themultiplier reusable in diﬀerent applications. It increases the flexibility to
use the same multiplier for diﬀerent applications and reduces the user’s cost. The proposed multiplier has a regular structure and
is very suitable for high speed VLSI implementation. In addition, the pipeline versatile multiplier can be modified to a low-cost
architecture which is feasible in embedded systems and restricted computing environments.
Keywords and phrases: finite field multiplication, Massey-Omura multiplier, normal basis, VLSI, encryption.
1. INTRODUCTION
The finite fields GF(2n) of characteristic 2 are of great inter-
est for cryptosystems and digital signal processing. The addi-
tion operation in GF(2n) is fast and inexpensive as it can be
realized with n bitwise XOR operations. The multiplication
operation is costly in terms of gate number and time delay.
There have been three main kinds of basis representations
of the field elements in GF(2n): standard (canonical, poly-
nomial) basis, dual basis, and normal basis. Diﬀerent basis
representation multipliers have their own benefits and trade-
oﬀs. The dual basis multiplier [1] needs the least number of
gates which leads to the smallest area required for VLSI im-
plementation [2]. The normal basis multiplier, for example,
Massey-Omura multiplier [3], is very eﬀective in performing
squaring, exponentiation, and inversion operation. The stan-
dard basis multiplier [4, 5, 6, 7] is easier to extend to high-
order finite fields than the dual or normal basis multipliers.
Most of the proposed finite field multipliers operate over
a fixed field. In other words, a new multiplier is needed if
there is a change in the field parameters such as the irre-
ducible polynomial defining the representation of the field
elements. This makes the multiplier not reusable. There are
few versatile multipliers [4, 6, 8, 9] reported and all based
on canonical basis. In this paper, we present a new VLSI ar-
ray of versatile pipeline multiplier based on the normal ba-
sis representation. In normal basis, the squaring is a cost-free
cyclic shift operation and the inversion (the most compli-
cated operation among the important finite field arithmetic
operations) can be eﬀectively computed by Fermat’s theo-
rem which requires recursive squaring and multiplication
[10, 11]. Three main advantages accrue from the proposed
pipelined versatile multiplier. First, the finite field parameters
can be changed according to the application environments. It
increases the flexibility to use the same multiplier for diﬀer-
ent applications. Secondly, the structure of the multiplier can
be easily extended to higher-order finite fields. Thirdly, the
basic architecture of the proposed multiplier can be modi-
fied to a low-cost multiplier which is very suitable for both
embedded systems and wireless devices with restricted hard-
ware resources. Moreover, the structure of the multiplier has
the properties of modularity, simplicity, regular interconnec-
tion, and is easy for VLSI implementation. The proposed ver-
satile multiplier can be eﬃciently used in public-key cryp-
tosystems, such as elliptic curve cryptography; and the dig-
ital signal processing, for example, the Reed-Solomon en-
coder/decoder.
The outline of the remainder of the paper is as follows.
In Section 2, we briefly review the normal basis representa-
tion and Massey-Omura multiplier. Section 3 contains the
derivation of the pipeline versatile normal basis multiplier in
GF(2n) and comparison with previous works. Section 4 con-
cludes with the improved result and a description of areas of
applications.
Low-Complexity Versatile Finite Field Multiplier in Normal Basis 955
2. MULTIPLICATION ONGF(2n)
It has been proved that there always exists a normal basis [12]





, . . . , β2
n−1}
, (1)
where β is a root of the irreducible polynomial P(x) of degree
n over GF(2) and n elements of the set are linearly indepen-
dent.
We say that β generates the normal basis N , or β is a nor-
mal element of GF(2n). Every element a ∈ GF(2n) can be
represented by a =∑n−1i=0 aiβ2i , where ai ∈ {0, 1}.
The following properties [10] of a finite field GF(2n) are
useful in the applications.
(1) Squaring is a linear operation, that is, given any two
elements a and b in GF(2n),
(a + b)2 = a2 + b2. (2)
(2) For any element a ∈ GF(2n),
a2
n = a. (3)
(3) For any element a ∈ GF(2n),
1 = a + a2 + a4 + · · · + a2n−1 . (4)
This implies that the normal basis representation of 1 is
(1, 1, . . . , 1).
(4) Squaring an element a in the normal basis represen-











= (an−1, a0, . . . , an−2
)
(5)
with indices reduced modulo n.
Let a and b be two arbitrary elements in GF(2n) in a nor-
mal basis representation and c = a·b be the product of a and
b. We denote a =∑n−1i=0 aiβ2i as a vector a = (a0, a1, . . . , an−1),
b = ∑n−1i=0 biβ2i as a vector b = (b0, b1, . . . , bn−1), and c =∑n−1
i=0 ciβ2
i
as a vector c = (c0, c1, . . . , cn−1), then the last term




a0, a1, . . . , an−1; b0, b1, . . . , bn−1
)
. (6)
Since squaring in normal representation is a cyclic shift of the
element, we have c2 = a2 · b2 or equivalently
(
cn−1, c0, c1, . . . , cn−2
)
= (an−1, a0, a1, . . . , an−2




Hence, the last component cn−2 of c2 can be obtained by




an−1, a0, a1, . . . , an−2; bn−1, b0, b1, . . . , bn−2
)
. (8)
By squaring c repeatedly, we get
cn−1 = f
(















Equations 9 define the Massey-Omura multiplier in nor-
mal basis representation [10]. In Massey-Omura multiplier,
the same logic function f for computing the last compo-
nent of cn−1 of the product c can be used to get the remain-
ing components cn−2, cn−3, . . . , c0 of the product sequen-
tially. In parallel architecture, we can use n identical logic
function f for calculating all components of the product
simultaneously.
3. A PIPELINE ARCHITECTURE FOR THE SERIAL
VERSATILE NORMAL BASIS MULTIPLIER
In this section, we derive a pipeline architecture to imple-
ment the versatile normal basis multiplier. Let c be the prod-


















2k , λ(k)i j ∈ GF(2). (11)






λ(k)i j aibj , 0 ≤ k ≤ n− 1. (12)
From the above analysis, we see that the important issue
for building a versatile normal basis multiplier is to get the
value of λ(k)i, j for diﬀerent irreducible polynomials. The n× n
matrices λ(k) (0 ≤ k ≤ n − 1) whose elements is λ(k)i, j (0 ≤ i,
j ≤ n − 1) can be obtained if we know the transformation
between the elements of the canonical basis and the elements
of the normal basis, that is, the normal basis representation
of the elements of the canonical basis.
In the following, we define the multiplication table of the
normal basis and use the basis element transformation for-
mula to get the values of the multiplication table, and then
obtain the n × n matrices λ(k). Finally, we illustrate the ap-
proach to build the versatile pipeline normal basis multiplier.
956 EURASIP Journal on Applied Signal Processing
Definition 1. Let N = {β, β2, . . . , β2n−1} be a normal basis in
GF(2n), then for any i, j (0 ≤ i, j ≤ n − 1), β2i β2 j is a linear
combination of β, β2, . . . , β2
n−1























where T is an n×nmatrix over GF(2). We call T themultipli-
cation table of the normal basis N . The number of nonzero
entries in T is called the complexity of the normal basis N ,
denoted by CN .
There always exists the multiplication table T and the
matrix λ(k) for a given irreducible polynomial which defines
the normal basis in GF(2n) [12]. After the multiplication ta-
ble T is obtained, the matrix λ(k) can be calculated according
to (12). An example is shown below.
Example 1. Let the irreducible polynomial be P1(x) = x5 +
x4 + x2 + x + 1 and β be a root of the polynomial, then
the canonical basis is {1, β, β2, β3, β4} and the normal basis
is {β, β2, β4, β8, β16}. We can get the following normal basis
representation for the elements of the canonical basis:
1 = β + β2 + β4 + β8 + β16,
β = β, β2 = β2,
β3 = β + β8, β4 = β4.
(14)
The appendix illustrates how to obtain the normal basis
representation of β3.
Thus the element βi (i > 5) can be reduced to the repre-
sentation of canonical basis and converted to the correspond-
ing representation of normal basis by the base element trans-
formation formula (14). For instance,
β17 = 1 + β2 + β3
= 1 + β2 + (β + β8)
= β16 + β4.
(15)





0 1 0 0 0
1 0 0 1 0
0 0 0 1 1
0 1 1 0 0


























The product of a and b is
c = ab
= c0β + c1β2 + c2β4 + c3β8 + c4β16
= (a0β + a1β2 + a2β4 + a3β8 + a4β16
)
× (b0β + b1β2 + b2β4 + b3β8 + b4β16
)
= a0b0β2 + a0b1β3 + a0b2β5 + a0b3β9 + a0b4β17
+ a1b0β3 + a1b1β4 + a1b2β6 + a1b3β10 + a1b4β18
+ a2b0β5 + a2b1β6 + a2b2β8 + a2b3β12 + a2b4β20
+ a3b0β9 + a3b1β10 + a3b2β12 + a3b3β16 + a3b4β24
+ a4b0β17 + a4b1β18 + a4b2β20 + a4b3β24 + a4b4β32.
(17)
As β6 = (β3)2, β10 = (β5)2, β18 = (β9)2, β12 = (β6)2,
β20 = (β5)4, β24 = (β3)8, β32 = β, we can easily obtain these
elements’ normal basis representation by cost-free cyclic shift
operation on the row of the multiplication table T and get





0 0 1 0 1
0 0 1 1 0
1 1 0 0 0
0 1 0 1 0




It can be readily seen that thematrices λ(k) (0 ≤ k ≤ n−1)
are symmetric.
From the matrix λ(4), we can get the following logic func-
tion to compute the most significant bit of the product of ab







= a0b2 + a2b0 + a0b4 + a4b0 + a1b2
+ a2b1 + a1b3 + a3b1 + a3b3.
(19)
In the normal basis representation, the logic function
f = (a0, a1, . . . , an−1; b0, b1, . . . , bn−1) which is used to get the
most significant bit (cn−1) of the product can also be used to
get the remaining bits (cn−2, cn−3, . . . , c0) of the product, ex-
cept we cyclically shift the input of the function [10]. Thus,
we may choose one matrix from the matrices λ(k) (0 ≤ k ≤
n−1) and input the values of upper triangle of the symmetric
matrix for doing the multiplication.
A VLSI array architecture to implement the versatile
GF(2n) normal basis multiplier is proposed and illustrated in
Figures 1 and 2. The basic cells in the structure are 3-input
Low-Complexity Versatile Finite Field Multiplier in Normal Basis 957
a
b Buﬀer
XOR 2-input XOR gate
b0 b1 b2 bn−2 bn−1
XOR XOR XOR





λn−1,0 λn−1,1 λn−1,2 λn−1,n−2 λn−1,n−1
XOR XOR XOR
AND AND AND AND AND
λn−2,0 λn−2,1 λn−2,2 λn−2,n−2 λn−2,n−1
XOR XOR XOR
AND AND AND AND AND
λ1,0 λ1,1 λ1,2 λ1,n−2 λ1,n−1
XOR XOR XOR
AND AND AND AND AND






















Figure 2: The architecture of the serial versatile normal basis
GF(2n) multiplier.
AND gates and 2-input XOR gates. We use the 3-input
AND gates to compute aibjλ
(n−1)
i, j in the X-Y dimension,
and compute the sum of aibjλ
(n−1)
i, j by a binary tree struc-
ture of 2-input XOR gates in the Z dimension. The archi-
tecture requires n2 3-input AND gates and n2 − 1 2-input
XOR gates, the time delay for generating one bit of the prod-
uct is TAND3 + 2(log2 n)TXOR, where TAND3 is the time de-
lay of a 3-input AND gate and TXOR is the time delay of a
2-input XOR gate. We can get all bits of the product by cycli-
cally shifting the input coeﬃcients of a and b. As the irre-
ducible polynomial is not changed frequently as the mul-
tiplicands, we can store the elements of the matrix λ(n−1)
in the registers once the irreducible polynomial has been
decided.
The algorithm for this multiplication can be described as
follows.






an−1 · · · a1a0





















Figure 3: A low-cost architecture of the serial versatile normal basis GF(2n) multiplier.
Algorithm 1 (versatile normal basis multiplication in GF(2n)).
Input: Coeﬃcients of a, b, and the matrix of λ(n−1).
Output: c = ab.
Begin
load matrix λ(n−1).









cyclic shift the coeﬃcients of a and b;
end;
End.
The proposed architecture can be implemented by a
pipeline structure. In the first n clock cycles, the coeﬃcients
of a and b are fed sequentially into the buﬀers. In the fol-
lowing n clock cycles, we will get the result of the product
by cyclically shifting the registers which store the original co-
eﬃcients of a and b. In the meantime, the next two multi-
plicands can be fed into the buﬀers during these clock cycles
and we can compute the second product immediately just af-
ter we finish the first one.
In the restricted computing environment, we can iter-
ate using one level components of the proposed multiplier
(Figure 2) to obtain a low-cost serial architecture as illus-
trated in Figure 3 to implement the same computation. It can
be described by the following algorithm.
Algorithm 2 (low-cost serial versatile normal basis multipli-
cation in GF(2n)).
Input: Coeﬃcients of a, b, and the matrix of λ(n−1).
Output: c = ab.
Begin
for k = n− 1 to 0 do
begin
c0k = 0;
for i = 0 to n− 1





cyclic shift the coeﬃcients of a and b;
end;
End.
The low-cost versatile normal basis multiplier in GF(2n)
requires n 3-input AND gates and n 2-input XOR gates. The
time delay for generating one bit of the product is n(TAND3 +
(log2 n + 1)TXOR).
The proposed versatile normal basis multipliers have
modular structures, regular interconnections which are suit-
able for high speed or restricted space of VLSI implemen-
tations. Table 1 lists the comparison of space and time com-
plexity between our newmultipliers and previous works. The
input ports of the proposed versatile multiplier are almost
the same as the nonversatile multiplier, since the finite field
parameters can be configured into themultiplier by the input
ports of multiplicands (a and b) through a one-bit control
signal at the configuration time. The finite field parameters
do not need reconfiguration during the running time of the
multiplier, until the application environments are changed.
Thus the hardware cost can be greatly reduced compared to
the nonversatile multiplier where a new multiplier has to be
redesigned and implemented when the finite field parame-
ters are required to be changed.
Low-Complexity Versatile Finite Field Multiplier in Normal Basis 959
Table 1: Comparison of versatile multipliers with nonversatile multipliers in GF(2n).
Multiplier Type # XOR Gates # AND Gates Time Delay
Wang-MOM [10] Nonversatile 2n− 2 2n− 1 n(TAND + (log2 n + 1)TXOR)
Li-CVM [9] (canonical basis) Versatile 2n2 2n2 n(TAND + 2TXOR)
Prop. multiplier (Figure 2) Versatile n2 − 1 n2 (3-input) n(TAND3 + 2log2 nTXOR)
Prop. low-cost multiplier (Figure 3) Versatile n n (3-input) n2(TAND3 + (log2 n + 1)TXOR)
Moreover, the proposed architecture in GF(2n) can be
easily expanded to the finite field of GF(22n). The one solu-
tion is to use two basic GF(2n) architecture to implement the
multiplication in GF(22n) and another alternative solution is
to do the GF(22n) multiplication serially by using only one
basic GF(2n) architecture.
4. CONCLUSION
In this paper, the architectures for finite field multiplication
based on normal basis have been proposed. The architec-
tures require simple control signals and have regular local in-
terconnections. As a consequence, they are very suitable for
VLSI implementation. The versatile property of this VLSI ar-
ray modular multiplier increases the application range and
the same multiplier can be applied for diﬀerent application
environments, such as elliptic curve cryptosystems and Reed-
Solomon encoder/decoder. The proposed multiplier can be
easily extended to high order of n for more security. More-
over, the structures can be modified to make fast exponen-
tiation and inversion. Also note that we can make a low-
cost and space eﬃcient serial multiplier which is feasible
in the restricted computing environments and embedded
systems.
APPENDIX
Let the irreducible polynomial be P1(x) = x5 + x4 + x2 +
x + 1 and let β be a root of the polynomial. We show the
procedures of computing the multiplication table T and the
matrix λ(4).
As β is a root of the P1(x),
β5 = β4 + β2 + β + 1, (A.1)
β6 = β5β
= β5 + β3 + β2 + β
= β4 + β2 + β + 1 + β3 + β2 + β
= β4 + β3 + 1.
(A.2)
We multiply β2 to both sides of (A.2), and get
β8 = β6 + β5 + β2. (A.3)
From (A.3),
β6 = β8 + β5 + β2. (A.4)
As
1 = β16 + β8 + β4 + β2 + β. (A.5)
Substitute (A.5) into (A.1),
β5 = β4 + β2 + β + β16 + β8 + β4 + β2 + β
= β16 + β8. (A.6)
Substitute (A.6) into (A.4),
β6 = β8 + β5 + β2
= β8 + β16 + β8 + β2
= β16 + β2.
(A.7)
From (A.2), we get
β3 = β6 + β4 + 1. (A.8)
Substitute (A.7) and (A.5) into (A.8),
β3 = β16 + β2 + β4 + β16 + β8 + β4 + β2 + β
= β8 + β. (A.9)
REFERENCES
[1] E. R. Berlekamp, “Bit-serial Reed-Solomon encoders,” IEEE
Transactions on Information Theory, vol. 28, no. 6, pp. 869–
874, 1982.
[2] I. S. Hsu, T. K. Truong, L. J. Deutsch, and I. S. Reed, “A com-
parison of VLSI architecture of finite field multipliers using
dual, normal, or standard bases,” IEEE Trans. on Computers,
vol. 37, no. 6, pp. 735–739, 1988.
[3] J. L. Massey and J. K. Omura, “Computational method and
apparatus for finite field arithmetic,” U.S. Patent application,
1981.
[4] B. A. Laws Jr. and C. K. Rushforth, “A cellular-array multiplier
for GF(2m),” IEEE Trans. on Computers, vol. 20, no. 12, pp.
1573–1578, 1971.
[5] P. A. Scott, S. E. Tarvares, and L. E. Peppard, “A fast VLSI mul-
tiplier for GF(2m),” IEEE Journal on Selected Areas in Commu-
nications, vol. 4, pp. 62–66, January 1986.
[6] L. Song and K. Parhi, “Low-energy digit-serial/parallel finite
field multipliers,” Journal of VLSI Signal Processing, vol. 19,
no. 2, pp. 149–166, 1998.
[7] S. K. Jain, L. Song, and K. K. Parhi, “Eﬃcient semisystolic
architectures for finite-field arithmetic,” IEEE Trans. on VLSI
Systems, vol. 6, no. 1, pp. 101–113, 1998.
[8] M. A. Hasan and A. G. Wassal, “VLSI algorithms, architec-
tures and implementation of a versatile GF(2m) processor,”
IEEE Trans. on Computers, vol. 49, no. 10, pp. 1064–1073,
2000.
960 EURASIP Journal on Applied Signal Processing
[9] H. Li and C. N. Zhang, “Eﬃcient cellular automata based
versatile modular multiplier forGF(2m),” to appear in Journal
of Information Science and Engineering.
[10] C. C. Wang, T. K. Truong, H. M. Shao, L. J. Deutsch, J. K.
Omura, and I. S. Reed, “VLSI architectures for computing
multiplications and inverses inGF(2m),” IEEE Trans. on Com-
puters, vol. 34, no. 8, pp. 709–716, 1985.
[11] G. Feng, “A VLSI architecture for fast inversion in GF(2m),”
IEEE Trans. on Computers, vol. 38, no. 10, pp. 1383–1386,
1989.
[12] A. J. Menezes, Applications of Finite Fields, Kluwer Academic
Publishers, Boston, Mass, USA, 1993.
Hua Li received his B.E. and M.S. degrees
from Beijing Polytechnic University and
Peking University. He is a Ph.D. candidate in
the Department of Computer Science, Uni-
versity of Regina. Currently, he works as an
assistant professor at Department of Math-
ematics and Computer Science, University
of Lethbridge, Canada. His research inter-
ests include parallel systems, reconfigurable
computing, fault-tolerant, VLSI design, and
information and network security. He is a member of IEEE.
Chang Nian Zhang received his B.S. degree
in applied mathematics from University of
Science Technology, China, and the Ph.D.
degree in computer science and engineer-
ing from Southern Methodist University. In
1998, he joined Concordia University as a
research assistant professor in Department
of Computer Science. Since 1990, he has
been with University of Regina, Canada,
in Department of Computer Science. Cur-
rently he is a full professor and leads a research group in parallel
processing, data security, and neural networks.
