A comparison of VLSI architecture of finite field multipliers using dual, normal or standard basis by Reed, I. S. et al.
I'DA Progress Report 42-90
ORIGINAL PAGE IS
OF POOR QUALITY-
N87-28785
w
April-June 1987
A Comparison of VLSI Architecture of Finite Field Multipliers
Using Dual, Normal or Standard Basis
I S.u.,, T K. T ...... _ _/I _h._^ ,_,-I I I no,..t.e,..,h
Communications Systems Research Section
I. S. Reed
Department of Electrical Engineering
University of Southern California
Three different finite fieM multipfiers are presented: (1) a dual basis multiplier due to
Berlekamp, (2) the Massey-Omura normal basis multiplier, and (3) the Scott-Tavares-
Peppard standard basis multiplier. These algorithms are chosen because each has its own
distinct features which apply most suitably in different areas. Finally, they are imple-
mented on silicon chips with NMOS technology so that the multiplier most desirable for
VLSI implementations can readily be ascertained.
I. Introduction
The era of the VLSI digital signal processor has arrived and
its impact is evident in many areas of technology. The trend is
Lo put more and more elements on a single silicon chip in order
Io enhance the performance and reliability of the system.
Recently, finite field arithmetic has found widespread
tpplications. Examples include cryptography, coding theory,
Lnd computer arithmetic. Among the finite field arithmetic
_perations, multiplication is the most complex and time con-
;uming. Already, it is used in a variety of systems, e.g., the
¢'LSI design of Reed-Solomon coder (Refs. 1 and 2). Hence,
t small, high performance finite field multiplier is urgently
leeded. Such a multiplier can also be used as a building block
:or the design of many large systems which use finite field
trithmetic.
In this article, three different finite field multipliers are
compared for suitability of VLSI implementation. These
include: (1) the dual basis multiplier due to Berlekamp (Ref. 3),
(2) the Massey-Omura normal basis multiplier (Ref. 4), and
(3) the Scott-Tavares-Peppard standard basis multiplier (Ref. 5).
They are chosen for comparison because each has its own
distinct features which make them suitable for specific appli-
cations.
Different basis representations of field elements are used in
these three multipliers. The dual basis multiplier uses the dual
basis representation for the multiplicand and standard basis for
the multiplier. The product is again in dual basis representa-
tion. The Massey-Omura multiplier uses normal basis represen-
tations for both the multiplicand and multiplier. The Scott-
Tavares-Peppard multiplier uses the standard basis representa-
tions for all field elements. The complexity of basis conversion
63
https://ntrs.nasa.gov/search.jsp?R=19870019352 2020-03-20T10:15:37+00:00Z
ORIG_NAL PAGE iS
OE POOR QUALITY
is heavily dependent on the choice of the primitive irreducible
polynomial which generates the field. If the polynomial is
chosen adequately, the basis conversion is a simple operation.
The algorithms for performing the basis conversions are pre-
sented in Appendixes A and B.
It is concluded in this article that the dual basis multiplier
needs the least number of gates, which in turn leads to the
smallest area required for VLSI implementation. The Massey-
Omura multiplier is very effective in performing operations
such as finding inverse elements or in performing squaring or
exponentiation of a finite field element. The standard basis
multiplier does not require basis conversion; hence it is readily
matched to any input or output systems. Also, due to its
regularity and simplicity, the design and expansion to high
order finite fields is easier to realize than in the dual or normal
basis multipliers:
Examples of these three multipliers are given in this article
for the purpose of illustration. Their 8-bit versions are imple-
mented on a silicon chip. The chip layouts are also presented
separately so that their differences can be distinguished with-
out difficulty.
II. The Dual Basis Multiplication Algorithm
Recently, Berlekamp developed a bit-serial multiplier for
use in the design of a Reed-Solomon encoder (Ref. 3). Hsu
et al. (Ref. 1) used Berlekamp's multiplier to design an 8-bit
single chip VLSI (255,223) Reed-Solomon encoder which has
proved to perform well. However, in that design, the multipli-
cand is a fixed finite field constant which is inconvenient if
one desires to change the multiplicand.
In the following, Berlekamp's bit-serial multiplication algo-
rithm is modified and generalized to allow both the multipli-
cand and multiplier to be variable. Thus, revision of Berlekamp's
algorithm is called the dual basis multiplication algorithm in
the rest of this article.
In order to understand the dual basis multiplication algo-
rithm, some mathematical preliminaries are needed. Toward
this end, the mathematical concepts of the "trace" and a
"dual" basis are introduced. For more details and proofs see
Refs. 2, 6, and 7.
Definition 1. The trace of an element/3 belonging to GF(p m),
the Galois field, ofp m elements, is defined as follows:
m--I
k
Tr(fl) = E tip
k=O
In particular, for p = 2,
m-1
Tr(3) = _ fl 2k
k=O
A fast algorithm for computing trace values of elements i_
GF(2 rn) is presented in Appendix C. The trace has the follow
ing properties which will not be proved here:
(1) [Tr(3)]P = tip + tip2 + ... + flpm-1 = Tr(3)
where [3 e GF(pm). This implies that Tr(fl) e GF(p)
i.e., the trace is in the ground field GF(p).
(2) Tr(fl + 7) = Tr(3) + Tr(3"), where/3, 3' e GF(p).
(3) Tr(c fl) = cTr(fl), where c e GF(p)
(4) Tr(1) = m (mod p)
Definition 2. A basis {uk) in GF(p m) is a set ofm linearl 2
independent elements in GF(p m).
Definition 3. Two bases {u/) and (_) are said to be thq
dual of one another if
= I 1, ifj = k
Tr(ujk k) ! O, ifjCk
For convenience, the basis (Uj) is sometimes called the origim
basis, and the basis (kk) is called its dual basis, even thoug
the concept of duality is symmetric.
Theorem. Every basis has a unique dual basis.
Proof. See Ref. 8.
Corollary 1. Let {u]) be a basis ofGF(p rn) and let (kk) b
its dual basis. Then a field element Z can be expressed in th
dual basis (he) by the expansion
m-1
Z = Z Zk_kk
k=0
where
zk = Tr(Z" uk)
Proof. Let
z = ZoX0 +zlh +.. "+_m-I Xm-1
64
Multiply both sides by uk and take the trace. Then by
Definition 3 and the property of trace:
m-1
Tr(Z'ug) = Tr E zi(kiuk)
i=0
= Zk
The following corollary is an immediate consequence of
Corollary 1.
Corollary 2. Let {uj} be a basis of GF(p m) and let {_} be
its dual basis. The product W = ZG of two fixed elements in
GF(p m) can be expressed in the dual basis by the expansion
m --1 m --1
W= E Tr(Wuk)'Xk = E Tr(ZGuk)kk
k=O k=O
where Tr(Wug) is the kth coefficient of the dual basis for the
product of two field elements.
These two corollaries provide a theoretical basis for the
dual basis finite field multiplier. In the tollowing section, a
detailed example is developed to illustrate the dual basis bit-
serial multiplication algorithm.
III. An Example of the Dual Basis
Multiplication Algorithm
The example is given in GF(2 4) for purposes of illustration;
the extension to more general cases is obvious.
Let a be a root of the primative irreducible polynomial
f(x) = x 4 + x + 1 over GF(2). Then a satisfies the equation
a 4 +a+ 1 = 0. It is also true that a is = 1. Let the standard
basis be {1, a, a 2, a 3} and its dual basis be {Xo, Xl, X2, X3).
Then, by Definition 3,
Tr(1 • _o) = 1
Tr(a. Xl) = 1
Tr (a z " _2) = 1
Tr(a 3 • _) = 1
3
z=E
k=O
be represented in dual basis. Also let
3
G =___ggu k
k=O
be represented in standard basis. Then, by Corollary 1,
zk = Tr(Za*)
Furthermore, let If = ZG be the product of two elements Z
and G. If If is represented in dual basis, then,
where by Corollary 1,
If one defines
3
k=O
_k = Tr(14,_) = Tr(ZG'Xk)
T (k) (If) = Tr(ZG - Xk)
then
T (°) (If) = Tr(ZGa °) = Tr(ZG)
= Tr(Z(go " a° +gl " a +g2 " a2 +g3 " a3))
= goZo+glzl +g_z2 +g3z3 (1)
From the definition of T(g) (W), one obtains
T_k)(W) = T(k-x) (aZG. ak-l)
Therefore, if ZG is replaced by a • ZG in T (k-l), i.e., Z by
a. Z, T (g) (W) can be obtained from T (k-x) (If). Let
Y = aZ = y0_0 +YlXl +y:_ +Y3k3
where
Yra = Tr(Y" am ) = Tr(Z- a m+x )
for each m. Then T(k) is obtained from T(k-D by replacing
z0 by Yo = Tr(Za) = z 1
z I by Yl -- Tr(Za2) = z2
z2 by Y2 = Tr(Za3) = z3
65
and
To reiterate,
za by Ya = Tr(Za 4) zo= 4- Z 1
3
w = za = _., _,o (w)xk
k=O
can be computed as follows:
(1) Initially for k = O, compute
T (°) (W) by Eq. (1).
(2) For k = 1,2, 3, compute T (k) (W) by
T (k-D (I4') = T (k-l) (YG)
and
feedback logic circuitry. The G-serial-to-parallel unit also per.
forms the serial-to-parallel operation of the input standard
basis element. Once all 8 bits of this element are stored in the
bottom register, they are latched bitwise so that no furthm
operations are performed on this element as required by the
algorithm.
Next, the output bits of these two units are fed into the
AND-generation unit. The output consists of the bitwise
AND-ed terms. These AND-ed terms again are fed into the
XOR-array unit which performs the addition of AND-ed
terms. This is needed since the addition of two elements in
GF(2) is just an exclusive-OR (XOR) operation. The term:
included in this XOR-array are as shown in the following:
Zogo+ zig1 + 5g_ + z393+ z494+ Zsgs+ z696+ zT&
The product is then obtained from the output of this XOR.
array bit by bit. Figure 2 shows the layout of this dual basis
multiplier.
r = _ = YoX0+y,Xl +y=X_+y3X_
with
YO = zI'Yl = z2'Y2 = z3'andY3 = Zo +zl = Z t"
where Z.r = z0 + z I is the feedback term of the
algorithm.
The above algorithm illustrates the dual basis multiplication
algorithm. The extension to an 8-bit multiplier is obvious and
will not be included here. The primitive irreducible polynomial
in the 8-bit design is chosen to be
f(x) = x s+x 4+x 3+x 2+1
In this case, the feedback term is
zs = z4+za +5 +z o
Figure 1 shows the logic diagram of an 8-bit dual basis
multiplier. The architecture is composed of four blocks. In
Fig. 1, the Z-serial-to-parallel unit performs the serial-to-parallel
operation of the input element with dual basis representation.
Once all 8 bits of this element are stored in the bottom register,
the cyclic operation starts and one bit is fed back from the
IV. VLSI Architecture for the Massey-Omura
Multiplier
Recently, Massey and Omura developed a multiplier whicll
obtains the product of two elements in the finite field GF(2 m)
In this invention, they utilize a normal basis of form (a, a 2
_4 .... , a2 m-1 } to represent each element in the field, where
a is the root of an irreducible polynomial of degree m ovm
GF(2). In this basis, each element in the field GF(2 m) can be
represented by m binary digits.
Using the normal basis representation, the squaring of ar
element in GF(2 m) is readily shown to be a simple cyclic
shift of its binary digits (Ref. 4). Multiplication of two ele
ments with a normal basis representation requires the sam_
logic circuitry for every product digit. Adjacent product digil
circuits differ only in their inputs which are cyclically shifted
versions of one another (Ref. 4).
The conventional method for finding an inverse element ir
a finite field uses either table look-up or Euclid's algorithm
These methods are not easily realized in a VLSI circuit. How
ever, by using a Massey-Omura multiplier, a recursive, pipeline
inversion circuit can be developed (Ref. 4). The details of th_
Massey-Omura multiplier algorithm are not discussed furthel
here. For a more detailed discussion, see Ref. 4.
The function f as described in (Ref. 4) is chosen to be th_
following:
ORIGINAL P,aCE iS
OF POOR QUALITY
f(a O,a l,a 2,a 3,a 4,a 5,a 6,a 7;b O,b 1,b 2,b 3,b 4,b 5,b 6,b 7) =
asb 0 +a6b 0 +a3b I +asb 1 +a4b 2 +asb 2 +a6b 2 +aTb 2
+alb 3 +a4b 3 +a2b4+a364 +aob s +aib 1 +a2b 5
+a6b s +aob 6 +a2b 6 +ash 6 +a6b 6 +a2b 7
where the primitive irreducible polynomial of this finite field
is
g(x) = x 8 + x 7 + x 6 + x 1 +1
There are a variety of different possible expressions for the
function f; however, the above one was chosen for the purpose
of iUustration. 1 Since each term in the above expression repre-
sents a conducting line in the AND portion of a PLA (pro-
grammable logic array), the fewer the number of terms there
are, the smaller the area needed to be used for VLSI imple-
mentation.
Figure 3 shows the block diagram of an 8-bit finite field
multiplier using the Massey-Omura normal basis algorithm.
The architecture of this chip is identical to that of the dual
basis multiplier. The differences between these two multi-
pliers are the following:
(1) The number of terms in the expression of the Massey-
Omura multiplier is twenty-one, while in the dual basis
multiplier it is only eleven. This means a substantial
amount of area is saved in the dual basis multiplier over
the normal basis multiplier.
(2) Both the input serial-to-parallel units are identical in
the Massey-Omura multiplier and no feedback is needed.
On the other hand, in the dual basis multiplier, the
register storing the element with standard basis repre-
sentation does not need to be cyclically shifted. This
field element remains latched in the same position.
Figure 4 shows the layout for the 8-bit Massey-Omura
multiplier.
V. VLSI Architecture for the Standard Basis
Multiplier
The Scott-Tavares-Peppard multiplication algorithm is
serial-in, serial-out, and pipeline in architecture. This algorithm
performs multiplication in GF(2 m) with order 0(m) in both
1Wang, C.C., "Computer Simulation of Finite Field Multiplications
Based on Massey-Omura's Normal Basis Representation of Field Ele-
ments," private communication, 1985.
computation time and implementation area, but requires
m + I time units between the first-in and first-out of compu-
tation. Due to the regularity of this architecture, the expan-
sion to higher order finite fields needs only replicas of a basic
cell. Furthermore, the irreducible primitive polynomial which
generates the finite field can be changed. This feature makes it
more convenient in use. This algorithm performs the finite
field multiplication with elements represented in standard
basis. As a consequence no basis conversion is needed. This
multiplier can be used for applications such as crypotography
where m is large. The algorithm is advantageous because of
its efficient implementation time and high throughputs. The
detailed algorithm will not be discussed further here. For more
details, see Ref. 5.
Figure 5 shows the logic diagram of an 8-bit standard basis
multiplier by Scott, Tavares, and Peppard. Inputs to this chip
are A and B, the two elements to be multiplied, and the
irreducible primitive polynomial F. These are fed into the
chip serially. The output is the product element P, which is
shifted out bit-by-bit.
In Fig. 5, A and F are shifted into their respective registers
serially bit-by-bit. Here A is the multiplicand and F is the
primitive irreducible polynomial that generates the finite
field. The multiplier is denoted by B and the product by P.
The register Pi contains the immediate product. Two control
signals are required. One is derived from the most significant
bit (MSB) of P, and the other from the state of hi, which is
latched with a flip-flop. The left shift is performed by loading
the output of cell CELL-/ into the product register of cell
CELL-/+ 1. Once the multiplication is completed, the most
significant bits of the product register are transferred to the
output shift register and shifted out serially.
The circuit diagram of the ith cell CELL-/ is shown in
Fig. 6. Since the ground field is GF(2), additions are per-
formed by exclusive-OR (XOR) gates. Pass transistors are used
to control the data flow. If a "0" is to be added, the input
line to the XOR gate is grounded; otherwise A and/or F are
passed. The output of the XOR gate is directed to the product
register of the next stage so adding and shifting is done. in one
clock cycle. Figure 7 shows the layout of this 8-bit standard
basis finite field multiplier.
Vl. Concluding Remarks
Three finite field multipliers are compared here. They are
dual basis multiplier, normal basis multiplier, and standard
basis multiplier. The dual basis multiplier occupies the smallest
amount of chip area in VLSI implementation if the basis con-
version is not included. Furthermore, since the dual basis multi-
67
plier performs multiplication by taking the inner product of
two elements and then feeds back the sum of certain bits of
one element, it is expected that as the order of field goes
higher, the dual basis multiplier will outperform the others.
The normal basis multiplier is very effective in performing
operations such as finding the inverse element or in performing
squaring or exponentiation of a finite field element. But the
area grows dramatically as the order of field goes up. Also, the
f function described in Ref. 4 is to be searched again by
computer as the field is changed, and it is usually very time
consuming. The standard basis does not require basis conver-
sion; hence it is readily matched to any input or output
system. Also, due to its regularity and simplicity, the design
and expansion to high order finite fields are easier to realize
than the dual or normal basis multipliers. The irreducible
primitive polynomial of the field is changeable in standard
basis multiplier. This distinct feature makes it more useful in
certain aspects.
References
1. Hsu, I. S., Reed, I. S., Truong, T. K., Wang, K., Yeh, C. S., and Deutsch, L. J., "The
VLSI Implementation of a Reed-Solomon Encoder Using Berlekamp's Bit-Serial
Multiplier Algorithm," IEEE Trans. on Computers, Vol. C-33, No. 10, pp. 906-
911, Oct. 1984.
2. Shao, H. M., Truong, T. K., Deutsch, L. J., ¥uen, J. H., and Reed, I. S., "A VLSI
Design of a Pipeline Reed-Solomon Decoder," IEEE Trans. on Computers, Vol. C-34,
No. 5, pp. 393-403, May 1985.
3. Berlekamp, E. R., "Bit-Serial Reed-Solomon Encoders," IEEE Trans. Inform. Theory,
Vol. IT-28, No. 6, pp. 869-874, Nov. 1982.
4. Wang, C. C., Truong, T. K., Shao, H. M., Deutsch, L. J., Omura, J. K., and Reed, I. S.,
"VLSI Architectures for Computing Multiplications and Inverses in GF(2m), '' IEEE
Trans. on Computers, Vol. C-34, No. 8, pp. 709-717, Aug. 1985.
5. Scott, P. A., Tarvares, S. E., and Peppard, L. E., "A Fast Multiplier for GF(2m), ''
submitted to 1EEE Trans. on Computers, 1985.
6. MacWilliams, F.J., and Sloane, N. J. A., The Theory of Error-Correcting Codes,
North-Holland Publishing Company, New York, N.Y., 1978.
7. Perlman, M., and Lee, J. J., "A Comparison of Conventional Reed-Solomon Encoders
and Berlekamp's Architecture," NASA Tech. Brief, No. 3610-81-119, Jet Propulsion
Laboratory, Pasadena, Calif., July 10, 1981.
8. Hsu, I. S., "New VLSI Architectures for Coding and Digital Signal Processing," Ph.D.
Dissertation, Electrical Engineering Dept., University of Southern California, Los
Angeles, Calif., 1985.
68
ORIGINAL P,a_GE IS
OF POOR QUALITY
DUAL BASIS IN_ Z-SERIAL-TO-PARALLEL(BoTToM(UPPERREGISTER)REGIS ER)t
I .............. d
IARRAY GENERATION
I .............. "12
Np__ (BOTTOM REGISTER)STANDARD BASIS I G-SERIAL-TO-PARALLEL
(UPPERREGISTER)
Fig. 1. Logic diagram of an 8-bit dual basis finite
field multiplier
Fig. 2. Layout of an 8-bit dual basis finite field multiplier
69
DATA-IN1
!
SERIAL-TO-PARALLEL
AND-GENERATION
D1 I
I °
i I
ORIGINAL PAGE IS
OF pOOR QUALITY
EXCLUSIVE-OR _ DATA-OUT
* ° * " " * * * * *T
SERIAL-TO-PARALLEL
1
DATA- IN2
Fig. 3. Block diagram of an 8-bit finite field multiplier
using Massey-Omura's normal basis algorithm
-,I_
I I
I l
L_______J
JL 111
Fig. 4. Layout of an 8-bit Massey-Omura finite field multiplier
70
PARALLEL-TO-SERIAL
P8
CELL 7
b.
I
P8
CELL 6
b I
]
._L
I P1 I.
-_- P8
CELL I CELL 0
b|
I r I
-- ._r _,_F
Fig. 5. Logic diagram of an 8-bit standard basis finite multiplier
MSB (P) "-_
ai _ I__I
bI D i
P;+I
Pi
- f i
!.
Fig. 6. Circuit diagram of the/th cell
as shown in Fig. 5
71
ORIGINAL PAGE IS
OF. POOR QUALITY
Fig. 7. Layout of an 8-bit standard basis finite field multiplier
72
Appendix A
A Method for Converting .an Element in Standard Basis to Dual Basis
In this appendix, a method for converting an element repre-
sented in standard basis to its counterpart in dual basis is
described by example. First, let the irreducible primitive poly-
nomial in GF(28) be
f(X) = x 8 +x 4 +x 3 +x _ +1
Then, from the definition of trace, one obtains
Tr(1) = 0, Tr(a) = 0, Tr(o_2) = 0, Tr(a 3) = 0,
Tr(a 4) = 0, Tr(a 5) = 1,Tr(ot 6) = 0, Tr(a 7) = 0
where a satisfies the equation x 8 + x 4 + x 3 + x 2 + 1 = 0.
An element Z in standard basis is written as
7
Z= _'_ zk ak
k=O
In dual basis, it is represented as
where
7
2E'Z = z_ ?tk
k=O
z_ = Tr (Za k)
= Tr((Zo_° + zlal + Z20_2 + Z30_3
"1- Z4_4 + Z50/5 + Z60_6 + ZTotT)_k )
= zo Tr(a k)+ z ! Tr(a k+l)+ z2 Wr(a g+2)
+ z3 Wr(otk+3) + z4 Tr(a k+4) + z5 Tr(a k+5)
+ z6 Tr(ot k+6) + z7 Tr(a k+7)
Therefore, once Tr(ak), for 0 _< k _< 14, are known, the basis
conversion from standard to dual can be completed.
73
Appendix B
A Method for Converting an Element in Dual Basis to Standard Basis
This appendix describes a method for converting an element
represented in dual basis to standard basis. Again, let an ele-
ment Z in dual basis be written as
7
z--E4_
k=O
In standard basis let it be represented as
7
Z = E Zkak
k=0
From the definition of the trace, one obtains
z k = Tr(Zkk)
t t
: Tr((Z'o_o +z,X, +z2_ +z'3_
t t t t
+ Z4_k 4 + Zs_k 5 +Z6X 6 +ZT_k,7) Xk)
t t t
= z o Tr(_oT_k) + z, Tr(Tt, Xk) + z 2 Tr(hX k)
t t
+ z 3 Tr(X3X k) + z 4 Tr(X4X k) + z s Tr(Xs Xk)
t !
+ z 6 Tr(X6X k) + z 7 Tr(_X k)
Hence, if the dual basis is determined and the trace values
of the above calculated, the basis conversion from dual basis to
standard basis can be completed.
74
ORIGINAL PAGE IS
POOR QUALITY
Appendix C
Fast Algorithm for Calculating Trace Values of Elements in GF(2 m)
In this appendix, a fast algorithm for calculating the trace
[ues of elements in finite field GF(2 m) is described. From
definition of the trace one has, for/3,/32e GF(2 m):
rn-1 k
Tr(/32) = E (/32)2 = /32 +_322+...+/32 'n = /3+/32
k=O
+ ... +/32 m-1 = Tr(J3)
Hence, if Tr(/3) is obtained, then Tr(/32) can also be obtained
hour calculation.
Since every element in GF(2 m) can be represented by the
_ents which compose the basis, i.e., for GF(2m), and the
basis is {ao , O/1 , 0/2, 0/3,0/4,0/5,0/6, 0/7), then/3 can be written
as
/3 = /300/0 +/310/1 +/320/2 +_30/3 +/340/4 +/350/5 +/360/6 +/370/7
From the properties of the trace, one has
Tr(/3) = /30 Tr(0/°) +/31 Tr(al) +/32 Tr(0/2) +/33 Tr(0/3)
+/34 Tr(0/4) +/35 Tr(0/s ) +/36 Tr(0/6) +/3"7Tr(0/'7)
Hence it is only necessary to calculate trace values of 0/0 0/_
0/2,0/3,0/4 0/5, 0/6 , _v , the rest can be obtained easily once it is
represented by the basis elements.
75
