A novel approach for bit-serial AB2 multiplication in finite fields GF(2m)  by Jeon, Jun-Cheol et al.
ELSEVIER 
An International Journal 
Available online at www.sciencedirect.com computers & 
,,=,=.~= ~C~o,.=~. mathematics 
with applications 
Computers and Mathematics with Applications 51 (2006) 1103-1112 
www.elsevier.com/locate/camwa 
A Novel Approach for Bit-Serial AB 2 
Mult ipl icat ion in Finite Fields GF(2 m) 
JUN-CHEOL JEON,  NEE-WON KIM AND NEE-YOUNG YOO* 
Dept. of Computer Engineering at Kyungpook National University 
Taegu, Korea, 702-701 
yook@knu, ac. kr 
(Received October 200~; revised and accepted July 2005) 
Abst rac t - -Th is  paper presents a new inner product AB 2 multiplication Mgorithm and effective 
hardware architecture for exponentiation in finite fields GF(2m). Exponentiation is more efficiently 
implemented by applying AB 2 multiplication repeatedly rather than AB multiplication. Thus, effi- 
cient AB 2 multiplication algorithms and simple architectures are the key to implementing exponen- 
tiation. Accordingly, this paper proposes an efficient inner product multiplication algorithm based 
on an irreducible all one polynomial (AOP) and simple architecture, which has the same hardware 
equipment as Fenn's AB multiplier. The proposed bit-serial multiplication algorithm and architec- 
ture are highly regular and simpler than those of previous works. @ 2006 Elsevier Ltd. All rights 
reserved. 
Keywords--PubIic-key cryptosystem, Exponentiation, Modular multiplication, Irreducible all 
one polynomial, Inner products. 
1. INTRODUCTION 
Finite-field arithmetic operations are currently receiving significant attention due to their im- 
portant and practical applications in the area of computers and communications, such as error- 
correcting codes [1], and a number of modern public key cryptographic systems [2]. These cryp- 
tosystems include the Diffie-Hellman key exchange and E1 Carnal encryption schemes, based on 
modular exponentiations which are implemented by applying modular AB or AB 2 multiplications 
over finite fields repeatedly as the basic scheme [3-7]. 
A numbers of studies have presented efficient architectures with algorithms for multiplication 
based on irreducible AOPs [8-10]. Koc proposed bit-parallel AB multipliers based on a canonical 
basis [8]. Liu proposed an AB 2 multiplication algorithm using an inner product and parallel two- 
dimensional cellular architecture [9]. Since Liu's algorithm generates a disordered sequence of 
coefficients for the resulting polynomial AB 2, it is unsuitable for bit-serial inear feedback shift 
register (LFSR) architecture. Fenn proposed two types of bit-serial AB multipliers based on 
*Author to whom all correspondence should be addressed. 
The authors would like to thank the anonymous referees for their valuable suggestions on how to improve the 
quality of the manuscript. This work was supported by the Brain Korea 2l Project and the MIC (Ministry of 
Information and Communication), Korea, under the ITRC (Information Technology Research Center) support 
program. 
0898-1221/06/$ - see front matter @ 2006 Elsevier Ltd. All rights reserved. Typeset by AM, S-TEX 
doi: 10.1016/j.camwa.2005.07.019 
1104 J.-C. JEON et al. 
LFSR architecture [10]. These algorithms and architectures have the features of modularity and 
low complexity but they still need improving with regard to time and space. 
Thus, the aim of the current paper is to investigate and develop an effective AB 2 multiplication 
algorithm and simple and regular architecture for the VLSI implementation of exponentiation 
in GF(2m), which is the key operation for public key cryptosystems. A one-dimensional linear 
feedback shift register (LFSR) is well matched to such criteria, and it has already been applied 
to many areas [10,11]. 
In this paper, we propose an efficient multiplication algorithm based on an inner product hat 
computes AB 2 multiplication, plus simple hardware architecture. The important innovation of 
our work is that the proposed bit-serial multiplier has the same hardware requirements and 
latency as Fenn's multiplier. The time complexity of a modular exponentiation operation, using 
the proposed architecture, is 330/0 lower on average compared to Fenn~s architecture. 
The remainder of paper is as follows: The mathematical background and Fenn's architecture 
are described in Section 2. In Section 3, the proposed algorithm and architecture are presented. 
Section 4 presents a discussion, together with a performance comparison between the proposed 
LFSR AB 2 multiplication architecture and LFSR AB multiplier presented in [10]. Finally, the 
conclusion is presented in Section 5. 
2. MATHEMATICAL  BACKGROUND AND 
REVIEW OF FENN'S  ARCHITECTURE 
In this section, we introduce the mathematical background in the finite fields, the AB multi- 
plication algorithm and the hardware architecture proposed by Fenn et al. [10]. 
2.1. F in i te  Fields 
A finite field or Galois field (GF), which is a set of finite elements, is closed based on commu- 
tative law, associative law, and distributive law, thereby enabling addition, subtraction, multi- 
plication, and division. Finite fields have received a lot of attention due to their important and 
practical applications in high bit-rate digital communications. When considered from a hardware 
implementation point of view, the field GF (2 "~) containing 2m elements is of particular interest 
for computer applications [12]. Although the addition in GF (2 m) is bit-independent and rela- 
tively straightforward, multiplication in GF (2 m) is a more complex and difficult task to carry 
out efficiently. Therefore, the efficient performance of multiplication in GF (2 m) has attracted 
considerable research interest. 
In order to express the elements in finite fields, there are three major bases; normal basis, 
dual basis, and polynomial basis. Since a polynomial basis multiplier does not require a basis 
conversion, it can be readily matched to any input or output system. Also, due to the regularity 
and simplicity of a polynomial basis multiplier, its design and expansion for high-order finite 
fields are easier to realize than with dual or normal basis multipliers. 
Let p(x) = x m + x m-1 +. . -  + x + 1 be an irreducible AOP of degree m over GF (2). p(x) can 
only be selected as an irreducible AOP for GF (2 m) if and only if (m + 1) is prime and 2 is the 
primitive modulo (m + 1). Then, {x m- l ,  xm-2 , . . . ,  x, 1} forms the polynomial basis for GF (2 m) 
and any A E GF (2 m) can be represented by the polynomial basis, such as A = Am_ix  m-1 + 
Arn_2xm-2 . .  •q- A lx  + Ao and Ai E GF (2) for i = m - 1,.. . ,  2, 1, 0. One of the AOP properties, 
p(x) = x "+1 + 1 = 0, has been suggested in [13] to reduce the complexity of field multiplications. 
It can be computed simply by p(x)+xp(x)  = (xmq-xm-14 -'" "q-Xq-1)q-x(xm+x m-1 -+-'" .q-Xq-1) = 
x "~+I + 1 = 0. Let 
A = Am_ ix  m-1 +. . .  + A lx+Ao 
be an element over GF (2"~). Then, A is also represented asA = amxmq-am- lxm- l+ "" .+alx+ao, 
where am = 0, and is denoted by an extended polynomial basis [13]. 
A Novel Approach 1105 
aoala2a3a4 
boblb2b3b4--.-~ 
C0CLC2C3C 4 
Figure 1. AOPM proposed in [10]. 
Table 1. The timing table for AOPM in GF (24). 
Input 
Ctrl 
Clock Sign. xo Xl x2 x3 x4 yo yl y2 y3 
1 1 a4 b4 
2 i a4 a3 b3 b4 
3 1 a4 a3 a2 b2 b3 b4 
4 1 a4 a3 a2 al bl b2 b3 b4 
5 1 a4 a3 a2 al ao bo bl b2 b3 
6 0 a3 a2 al ao a4 bo b3 bl b4 
7 0 a2 al a0 a4 a3 bo b3 bl b4 
8 0 al ao a4 a3 a2 bo b3 bl b4 
9 0 ao a4 a3 a2 al bo b3 bl b4 
Output 
Y4 
ba c4 
b2 c3 
b2 c2 
b2 Cl 
b2 co 
2.2. Review of Bit-Serial AB Multiplication Architecture Proposed by Fenn et al. 
Let a, b, c E GF(2m), such that c = ab and they represent these elements in the extended 
polynomial basis as 
0 0 0 
a= ~-~aix i, b= ~'~ bix', and c= ~"~c,x i. 
i=m i :m i :m 
Then, the following relationship holds, [Cm I [aO at"'" am] bm 
Cm-  1 am ao am-  1 bin- 1 
(I) 
Consider equation (1) and Figure 1. If the registers in Figure 1 are initialized by Xm--i = a~, 
Yi = bi, for (i = 0, 1, . . . ,m),  the first product Cm will immediately be available on the output 
line. The remaining product bits ai (i = m - 1,.. . ,  1, 0) can be obtained by clocking the upper 
shift register a further m times. 
The AOPM proposed by Fenn et al. has been presented by a conventional MSB-first multipli- 
cation algorithm [10]. Table 1 shows how the values change in the registers every clock cycle in 
CF (2'9. 
The computation of AOPM over GF (2 m) operates as follows: All x~ and yi registers are 
initialized during (re+l) clock cycles. Each operand, ai, bi, is inputted through the Xm or Y0 
1106 J.-C. JEON et hi. 
register, respectively. Each value of the xi and Yi registers are circularly shifted once to the left 
ol- right, respectively, during the (rn + 1) initial clock cycles. Then, the values of the Yi registers 
are held in the register, while the values of the xi registers are steadily shifted once to the left 
during the last m clock cycles. The results are produced from the final input clock cycle to the 
end of the computational c ock cycle. All computations require (2m + 1) clock cycles. 
3. PROPOSED ALGORITHM AND ARCHITECTURE 
In this section, we propose an efficient bit-serial AB 2 multiplication algorithm; thereafter, we 
present a simple hardware structure based on the LFSR architecture. 
3.1. P roposed  Inner  P roduct  A lgor i thm 
LEMMA 1. Let B = b,~x ~' + . . .  + blX + bo be an element over GF (2m). Then, B 2 rood p(x) can 
be represented by 
0 
B 2 modp(x) = E bi2-1xi' (2) 
i=m 
where p(x) = x m+l +1. Note that the subscript, k, of bk, is computed by modulo (m + 1) for a11 
computations. 
PROOF. The left side of equation (2) is represented by 
B 2 modp (x) = (b,~x m + bin_ix m-1 + bm_2x m-2 +. . .  + b2x 2 + blx + b0) 2 modp (x) 
=bm x2m ÷ bm_l x2(m-1) + . . .  + bm/2+l xm+2 (3) 
+ b,~/2x "~ +. . .  + b2x 4 + blx 2 + bomodp(x) .  
Equation (3) should be reduced by the modulus p(x), i.e., the m/2 terms, X2m,X2(m-1),.. . ,  
X 2('~-(m/2-1)) (= X m+2) have to be reduced by p(x). Since x2(m-~)mod p(x) = x m-2i-1, 0 < i 
<~ (m/2 - 1), this produces 
B 2 modp (x) = bmx (m-l) + bin_ix (m-3) + .. .  + bm/2+lx 1
(4) 
+b,~/2x m + ""  + b2x 4 + blx 2 ÷ bo modp (x). 
From equation (4), it should be noted that the first m/2 terms have an odd degree of x and 
the corresponding coefficients of the terms are b,~, bin-l, bm-2, . . - ,  and bm_(m/2_l) ( :  bm/2+l). 
Meanwhile, the remaining m/2 + 1 terms have an even degree of x and the corresponding coeffi- 
cients are b,~/2 , bin~2-1,..., b2, bl, bo. 
Meanwhile, the right side of equation (2) is represented by 
0 
E bi2_l X i .m rn-1 . . =bm.2-~x ÷b(m_l ) .2 - lX  +b(m_2) .2 -~xm-2- [  -. .+b2.2-~x2+bl.2-~x-t-bo.2-~ (5)
It is shown that each coefficient of term x i (0 ~ i ~ m) in equation (4) is equal to that of the 
corresponding term in equation (5). First, consider the coefficients of the odd degree terms in 
equation (5). Since m is even, the coefficient of the odd degree term (rn - i) is b(,~_i).2-~ when 
i = 1 ,3 ,5 , . . . ,m-  1. Then, the subscript (m- i )2  -1 = (m- i ) /2  is not an integer. Since 
(m + 2) =- 1 mod(m + 1), this produces (m - i)/2 mod(m + 1) = (m(m + 2) - i ) /2 mod(m + 1) = 
(m '2 ÷ 2m - i) /2 mod(m + 1) = (1 + 2m - i ) /2 mod(m + 1) = m - (i - 1)/2. As a result, we have 
b(m-i).2-~ rood (re+l) = bm-(i-W2- Thus, the coefficients of term x m-i  with an odd degree i are 
b,~,b,~_x,bm_2,..., and b,,/2+1 when i = 1 ,3 ,5 , . . . ,m - 1. 
Now, consider the coefficients of the even degree terms in equation (5). The coefficient of the 
even degree term (m - i) when i = 0, 2, 4, . . . ,  m is b(m-~).2-,. Since (m - i) is even for an even 
i, the subscript (m - i)2 -1 = (m - i)/2 is always an integer for an even i. Thus, the coefficients 
b( . . . .  i)/2 of term 97 m- i  with an even degree i when i = 0, 2, 4, . . . ,  m are bin~z, bin~2-1,. •., b2, bl, bo. 
Therefore, each coefficient of term x i (0 ~< i ~ m) in equation (5) is equal to that of the 
corresponding term in equation (4). Thus, it can be concluded that Lemma 1 holds. 
A Novel Approach 1107 
0 
DEFINITION 1. Let A = }-~i=m aix, be an element over GF (2"~). A <i> and A<-J) 
ment A, which is shifted circularly by j bits to the right and left, respectively, i.e., 
0 0 
A(J) = E a~+ixi and A(-J) = E ai-jx~- 
i~m i~m 
denote ele- 
DEFINITION 2. Let B 2 o = ~i=m bi2 -~xi be an element over GF (2m). B 2(j) and B 2(-j) denote 
element B 2, which is shifted circularly by j bits to the right and left, respectively, i.e., 
0 
B2g) = E bi2-~+J xi 
i=m 
Based on Lemma 1, we have 
and B2(-J) -- E bi2-~-J x~" 
i=m 
0 0 
A.  B 2 = E aixi " E bi2-~x'modp(x) 
i=rn  i=m 
0 
= E aibi2-1x2i modp (x) 
0 
= E ai2-1bi2-12-1xi" 
o i B 2 o i LEMMA 2. Let A = ~=m aix and = ~'~i=m bi2-1x be elements over CF (2m). The inner 
product of A and B 2 is A - B 2 0 = E i=m ai2-1bi4 -lxi" Obviously, A • B 2 = A (°) • B 2(0) .
The coefficients of the inner product A. B 2 can be computed in order of the decreasing degree 
of term x i from the formula of Lemma 2. 
DEFINITION 3. The jth inner product A (2j) • B 2(-j) is defined as 
o 
A(2J) ' B2(-J) ---- E a(i+2j)2-~b(i2-1-J) 2.1xi" 
i=m 
THEOREM 1. Let  A = amxm+am_lxm-l  + .. .+alx+ao and B = bmxm-i-bm_lxm-l-t-...+blx+bo 
be elements over OF (2m), which is produced by an irreducible AOP of degree m. Then, the 
product of A and B 2 can be represented by the following recurrence quation, 
AB 2 = A (°) • B 2(0) -t- A (2) - B 2(-1) + ... + A (2m) • B 2(-m). 
Therefore, it turns out that 
771 
AB2 = E A(2J) " B2<-i)" 
j=o 
PROOF. The product AB 2 of A and B 2 can be simplified by using the property of xm+l+l  and 
Lemma 1 as follows, 
0 0 
AB2 = E a~zi E bi2-1z~ modp(x)  
i~Tn i~m 
= (amx m + am-1  zm-1  -t- ' " • -]- a lx  ~- ao)  (6 )  
× (bin~2 xm +bm xm-1 -{-.-. + blx 2 -t- bm/2+lX -1- bo) modp (x) 
= dmx m + d in- ix  m-1 + "" + dlx + do, 
where dk = z.-~i+2jmod(m+l)=k and a s and bj are the coefficients of A and B 2, respectively. 
1108 J.-C. JEON et al. 
Based on Definition 3, we have 
A(2J). B2(-J) = 
o 0 
E ai+2jxi" E bi2-1-J ximOdp(x) 
i=m i=m 
0 
= E a(i+2j)2-1b(i2-1-J) 2-1zi 
i=m 
o 
= E a(i+2j)/2b(i/2-j)/22:i" 
i=m 
m Thus, the right side of Theorem 1, ~j=o A(2J)'B2(-J) is an element over GF (2"~), which is 
produced by an irreducible AOP p(x) of degree m. It can be written as follows, 
E A(2J)'B2(-J) = E a(i+2J)/2b(ff2-J)/2xi" (7) 
j=0  j=0 i=rn  
rr~ Let ~ j=0 A(2J)'B2(-3) = CmXm + Cm-lX'~-i + "'" + clx + co. Then, from equation (7), the 
coeffi_cient ck of term x k (0 <~ k <~ m) is 
C k -~ ~ a(k+2j)/2b(k/2-J)/2 
j=o (s) 
= ak/2bk/4 + a(k+2)/2b(k/2-1)/2 -1-... + a(k+2j)/2b(k/2-j)/2 + ... + a(k+2m)/2b(k/2-m)/2. 
To prove that ck = dk for 0 ~< k ~< m, it must be shown that (s + 2t) mod(m + 1) = k for all 
terms asbt of the coefficient ck of equation (8), i.e., each term asbt is one among the terms of the 
coefficient dk of equation (6). So, for an arbitrary term a(k+2j)/2b(k/2-j)/2, 0 <~ j <~ m, of the 
coefficient ck, we have (k + 2j) /2 + (k/2 - j ) /2 mod(m + 1) = k. Thus, term a(k+2j)/2b(k/2-j)/2 is 
one among (m + 1) different erms of the coefficient dk of equation (6). It is proved that ck = dk 
for 0 <~ k ~< m. Therefore, it turns out that AB 2 = ~'=o A(2J)'B2(-J)" The following shows an 
example of Theorem 1, when m = 4. 
EXAMPLE 1. Suppose  m = 4, A = a4 x4 q- a3 x3 ~- a2 x2 -q- al xl --}- ao, and B = b4x 4 + b3 x3 q- b2x 2 + 
blx I + bo. It will be shown that AB 2 = ~j=o Ai2J)'B2(-J)" As such, 
AB 2= (a4x 4 + aaxa + a:x 2 + a~x ~ + a0)(b~x 4 + b4x ~ + blx: + bax ~ + b0)modx ~ + 1. 
For example, if c2 and d2 are considered as the coefficients of x 2 for k = 2. From the above, it 
is found that d2 = a464 + aab2 + a2bo + alb3 q- aobl. For all terms asb t of d2, it is satisfied that 
(s + 2t) mod5 = k = 2. 
In addition, for coefficient c2, the following results from equation (8) can be shown, 
4 4 
c2 = E a(2+2j)/2b(2/2-J)/2 = E a(l+j)b(1-J)/2 
j=0 j=0 
= albl/2 + a2bo + a3b_l/2 + a4b-1 + asb-3/2 
= alb3 + a2bo + a362 + a4b4 + aobl. 
Thus, it is shown that ck = dk for k = 2. Plus it can be proved that ck = dk for a l l k  by 
using the same method. Figure 2 shows all coefficients, from co to c4 for the AB 2 multiplication 
(m = 4). Note that ck is computed by adding all terms in the corresponding column in a 
decreasing order of subscripts. As shown in Figure 2, ai is circularly shifted two bits to the right 
while bi is circularly shifted two bits to the left (0 ~< i ~< 4). 
A Novel Approach 1109 
A(2./) . B2(-J) 
A {0) . B2(-o) 
A (2) . B2(-1) 
A{4) . B2(-2) 
A (6) . B2(-3) 
A<8) . B2(-4) 
x 4 x 3 ;T 2 371 X 0 
a2bl a4b2 alb3 a3b4 aobo 
a3b3 aob4 a2bo a4bl alb2 
a4bo albl a3b2 aob3 a264 
aob2 a2b3 a4b4 albo a3bl 
alb4 a3bo aobl a262 a463 
C4 c3 C2 C1 Co 
F igure  2. AB 2 mult ip l i ca t ion  us ing  Theorem 1 in GF  (24).  
aoala2a3a4 
% 
6x]--qs3-- 
DLD 
coClC2C3C4 
Figure 3. The proposed bit-serial AB 2 inner product multiplier over GF (24). 
Since the proposed AB 2 algorithm of Theorem 1 does not need further reduction and the 
rearrangement of the coefficients ci results in the polynomial C = AB 2, the algorithm is better 
implemented on bit-serial architecture than Liu's algorithm. 
3.2. Proposed Bit-Serial AB 2 Multipl ication Architecture 
This section presents effective bit-serial AB 2 multiplication architecture based on our algorithm 
which was shown in the previous section. Figure 3 shows the proposed bit-serial AB 2 inner 
product multiplier. 
As seen in Figure 3, the proposed multiplier that computes AB 2 multiplication has nearly 
the same hardware equipment as that of Fenn's AB multiplier, as shown in Figure 1, except for 
the wirings. Each Yi (0 ~< i ~< 4) register and x0 register have a MUX. The input values, ai 
(0 ~< i ~< 4), are circularly shifted once to the left, while the bi (0 ~< i ~< 4) values are circularly 
shifted 2 bits to the right during (m + 1) initialization clock cycles. Whenever the control signal 
is one and the clock becomes a rising edge, the input values enter each register through a MUX. 
As soon as all input values enter the registers, the first result, c4, comes out. Thereafter, the 
control signal is converted to zero. Then, the input values, ai (0 ~< i ~< 4), are circularly shifted 
once to the left while the bi (0 ~< i ~< 4) values are held during the last m computational c ock 
cycles. Our multiplier operates according to the following timing sequence. 
In order to compute AB 2, the multiplier carries out the computation for each column in Figure 2 
for every clock cycle. The computation over GF (2 m) proceeds as follows. All xi and yi registers 
are initialized during (m + 1) clock cycles. Each operand, ai, bi, is inputted through an xm or y0 
register, respectively, based on the relationships between A and B. The values of the xi registers 
are circularly shifted once to the left, while the values of the Yi registers are circularly shifted m/2 
bits to the right, during (m + 1) initial clock cycles. 
With the last initial clock cycle, the first result is produced. Then, the values of the Yi 
registers are held, while the values of the xi registers are steadily shifted once to the left during 
the last m clock cycles. The results are produced from the final input clock cycle to the end of the 
1110 J.-C. JEON et al. 
Table 2. The t iming table for the proposed inner product mult ipl ier in GF (24). 
Input 
Clock Ctrl xo Xl x2 x3 x4 YO 
Sign. 
1 1 a4 54 
2 1 a4 a3 b3 
3 1 a4 a3 a2 b2 
4 1 a4 a3 a2 a l  bl 
5 1 a4 a3 a2 a l  ao bo 
6 0 a3 a2 a l  a0 a4 bo 
7 0 a2 al  ao a4 a3 bo 
8 0 a l  ao a4 a3 a2 bo 
9 0 ao a4 a3 a2 a l  bo 
Yl Y2 Y3 Y4 Output  
b4 
b3 b4 
b4 b2 b3 
b3 bl b4 b2 
b3 bl b4 b2 
b3 bl b4 b2 
b3 bl b4 b2 
b3 bl b4 b2 
c4 
c3 
c2 
ci 
cO 
computational c ock cycle. All computations require (2m + 1) clock cycles, which are composed 
of ('m + 1) initialization clock cycles and m computational clock cycles. 
4. COMPARISON AND DISCUSSION 
This section compares the multiplier in [10] and the proposed AB 2 multiplier. The proposed 
algorithm is basically considered by the concept of the inner product computation, while the 
algorithm in [10] has been followed by a conventional MSB-first multiplication algorithm. Mul- 
tiplication is the key operation for implementing circuits for cryptosystems. This is because the 
process of encrypting and decrypting a message requires modular exponentiation, which can be 
decomposed into repeated multiplications. 
Exponentiation can be computed by using power-sum operations or AB 2 multiplications. A 
popular algorithm for computing exponentiation is the binary method proposed by Knuth [14]. 
Let C and A be elements of GF (2m), the exponentiation of A is then defined as 
C=A E, O<~E~n,  
where n = 2 "~ - 1. The exponent E is an integer and can be expressed by E = em_12 " -1  + 
e,r~-22 "~-2 + " -  + el21 + e0. There are two ways this can be done. The first is the MSB first 
exponentiation and the other is the LSB-first exponentiation. Starting from the most significant 
bit of the exponent, the exponentiation of A can be expressed as 
Since equation (9) is actually composed of a series of squaring and multiplication steps, an 
algorithm for computing the exponentiation is presented as follows. 
[MSB-First Exponentiation Algorithm] 
Input : A, E, p(x) 
Output : C = A Emodp(z)  
Step 1 C = 1 
Step2 fo r i=m- l to0  
Step 3 C = C 2modp(x) 
if (e~ = 1) C = CA modp(x) 
According to the above algorithm, m squaring always existed, plus the number of modular mul- 
tiplications of type C = CA modp(x) is equal to the number of ones in the binary representation 
of E. This is an integer between 0 and rn. Thus, the total number of modular multiplications 
is at least rn and at most 2m, as stated above. When assuming that a bit pattern of 0 and 1 is 
A Novel Approach 
Table 3. Comparison of the number of multiplications between AB architecture 
and AB 2 architecture for performing modular exponentiation when m = 1024 (unit: 
number of times). 
Circuit 
The Number 
of Multiplications 
Total Number 
of Multiplications 
AOPM [101 Fig. 3 
Squaring Multiplication 
1024 
1024 1 ~ 1024 
1025 ~ 2048 
1024 1536 (average) 
1111 
equivalent,  the number  of modular  mult ip l icat ions is on average (m + rn/2) .  On the other  hand, 
AB 2 nmlt ip l icat ion architecture always operates m times. Therefore,  modu lar  exponent iat ion  
by an AB 2 mult ipl ier  can reduce the number  of modu lar  mult ip l icat ions by at least once and at 
most rn t imes compared to an AB mult ipl ier.  Thus,  the key point  is how to design simple AB 2 
nmlt ip l icat ion architecture. 
Table 3 shows the number  of mult ip l icat ions involved in performing a modu lar  exponent iat ion 
operat ion.  Publ ic  key cryptography in GF  (21024) is sufficient in order to achieve a high level of 
security. 
The proposed architecture,  which computes  AB 2 mult ip l icat ion,  has the  same hardware equip- 
ment  and clock cycle as Fenn's  mult ipl ier.  Consequently,  the exponent ia t ion  operat ion by the 
proposed mult ip l ier  has reduced the number  of mult ipl icat ions,  by an average of 33%, as compared 
to Fenn's  mult ipl ier.  
5. CONCLUSION 
We have proposed an efficient algorithm which carries out AB 2 computation in the finite 
field GF (2m). Moreover, we have presented bit serial LFSR architecture for the effective imple- 
mentation of the inner product algorithm. For implementing the exponentiation, the proposed 
architecture has shown a superior efficiency to the existing architecture, while at the same time 
minimizing time complexity by 33%, as compared to the multiplier proposed in [10]. Since our 
architecture has the features of regularity and modularity, it can be used as an efficient basic algo- 
rithm and architecture for division, inversion, and exponentiation which are the core operations 
in public key cryptosystems. 
REFERENCES 
1. M. Grangetto, E. Magli and G. Olmo, Robust video transmission over error-prone channels via error correcting 
arithmetic odes, IEEE Communications Letters 7 (12), 596-598, (2003). 
2. R.J. McEliece, Finite Fields for Computer Scientists and Engineering, Kluwer Academic, New York, (1987). 
3. W. Diffie and M.E. Hellman, New directions in cryptography, IEEE Trans. on Info. Theory 22, 644-654, 
(1976). 
4. T. El Gamal, A public key cryptosystem and a signature scheme based on discrete logarithms, IEEE Trans. 
on Info. Theory 31 (4), 469-472, (1985). 
5. A.J. Menezes, Elliptic Curve Public Key Cryptosystems, Kluwer Academic Publishers, Boston, MA, (1993). 
6. S.W. Wei, VLSI architectures for computing exponentiations, multiplications, multiplicative inverses, and 
divisions in GF (2m), IEEE Trans. Circuit and Systems Analog and Digital Signal Processing 44 (10), 847- 
855, (1997). 
7. S.W. Wei, A systolic power-sum circuit for GF (2m), IEEE Trans. on Comp. 43 (2), 258-262, (1990). 
8. C.K. Koc and B. Sunar, Low complexity bit-parallel canonical and normal basis multipliers for a class of 
finite fields, IEEE Trans. Comp. 47 (3), 353-356, (1998). 
9. C.H. Liu, N.F. Huang, and C.Y. Lee, Computation of AB 2 multiplier in GF (2 m) using an efficient low- 
complexity cellular architecture, IEICE Trans. Fundamentals E83-A 12, 2657-2663, (2000). 
10. S.T.J. Fenn, M.G. Parker, M. Benaissa nd D. Tayler, Bit-serial multiplication i  GF (2 m) using irreducible 
Ml-one polynomial, IEE Proc. Comput. Digit. Tech. 144 (6), 391-393, (1997). 
11. E.R. Berlekamp, Bit-serial Reed-Solomon encoders, IEEE Trans. IT-2 6, 869-874, (1982). 
12. R. Lidl and H. Niderreiter, An Introduction to Finite Field and Their Applications, CUP, Cambridge, (1986). 
13. T. Itoh and S. Tsujii, Structure of parallel multipliers for a class of fields GF (2m), Information and Compu- 
tation 83, 21-40, (1989). 
1112 J.-C. -lEON et al. 
/4. D.E. Knuth, The Art of Computer Programming. Volume 2: Evaluation of Powers, Second Edition, Addisor 
Wesley, Reading, MA, (1969). 
