Error Detecting Dual Basis Bit Parallel Systolic Multiplication Architecture over GF(2m) by Singh, Ashutosh Kumar et al.
©2009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for 
advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, 
or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Error Detecting Dual Basis Bit Parallel Systolic 
Multiplication Architecture over GF(2m) 
 A. K. Singh 
CS Dept., School of Engineering 
Curtin University of Technology, Malaysia 
 
Asish Bera 
School of VLSI Technology 
Bengal Engg. & Sc. University, Shibpur, India 
 
H. Rahaman, J.Mathew and D.K.Pradhan 
Computer Science Dept. 
University of Bristol, UK 
 ({hafizur, jimson, pradhan} @cs.bris.ac.uk) 
 
Abstract— This paper presents an error tolerant hardware 
efficient VLSI architecture for bit parallel systolic multiplication 
over dual base, which can be pipelined. This error tolerant 
architecture is well suited to VLSI implementation because of its 
regularity, modular structure, and unidirectional data flow. The 
length of the largest delay path and area of this architecture are 
less compared to the bit parallel systolic multiplication 
architectures reported earlier. The architecture is implemented 
using Austria Micro System’s 0.35um CMOS technology. This 
architecture can also operate over both the dual-base and 
polynomial base.  
Keywords- Finite Field, RS codes, bit parallel, systolic, error 
correction, VLSI Testing. 
  
I. INTRODUCTION 
Finite field also known as Galois Field arithmetic operations over 
GF(2m) find increasing  applications in public-key cryptography, 
error detecting and correcting code[9], VLSI testing[10], digital 
signal processing[11].  There are different equivalent representations 
of the elements of the finite field over GF(2m) e.g. Polynomial base 
(PB), normal base, and dual base. Dual-basis operators frequently 
have the lowest hardware requirements of all available operators [18-
19]. Two basic operations over GF(2m) are addition and 
multiplication.  Addition over GF(2m) is relatively straightforward to 
implement, requiring at most m XOR gates. Multiplication operation 
is much more expensive in terms of gate count and clock cycle. Other 
operations of the GF(2m) fields like exponentiation, division, and 
inversion can be performed by repeated multiplications. Based on 
different base representation, a variety of architectures for 
multiplication have been proposed. For high speed VLSI 
implementation, the preferred architecture for polynomial basis (PB) 
multiplier is systolic array architecture. In this type of architecture, a 
basic cell is repeated in an array and signals flow unilaterally 
between neighbours. PB systolic array multipliers in GF(2m) can be 
classified into four categories, namely bit serial, bit-parallel, hybrid 
and digit-serial. The bit serial architecture has minimum area and 
minimum throughput among all the categories. The problem with 
serial architecture is its latency. The bit-serial architecture, which 
processes one bit of input data per clock cycle, is area-efficient and 
suitable for low-speed applications.  
The most widely used bit serial multiplier is dual basis Berlekamp bit 
serial multiplier [12]. This multiplier requires less hardwire. PB bit-
serial and bit-parallel systolic multipliers were presented in [8, 13]. A 
bit-serial dual basis systolic multiplier over GF(2m) was presented in 
[3], which requires higher hardware compared to that needed for 
multiplier proposed in [6] and does not support pipelining. To 
support pipelining, a modified version which requires less hardware 
is presented in [14]. The bit parallel multiplier needs largest area and 
provides maximum throughput. Bit-parallel architecture, capable of 
processing one whole word of input data per clock cycle, is ideal for 
high-speed applications when pipelined at the bit-level. These 
architectures are typical examples of the area–speed tradeoff 
paradigm. Mastrovito has proposed an algorithm along with its 
hardware architecture for PB multiplication [7] known as the 
Mastrovito algorithm/multiplier. A formulation for Polynomial basis 
multiplication and generalized bit-parallel hardware architecture for 
special reduction polynomials has been presented in [2].  A testable 
polynomial basis bit parallel multiplier circuits over GF(2m) was 
presented in [21]. Although bit-serial dual basis multipliers have been 
widely employed in applications such as RS encoders [3], it has been 
proven in [19] that it is advantageous of employing bit-parallel dual 
basis multipliers, particularly in more complex circuits such as RS 
decoders and syndrome calculators. Bit-parallel dual basis multipliers 
therefore allow for reduced complexity constant multipliers. 
In this paper, we present a hardware efficient fast bit parallel systolic 
architecture with error detecting capability using parity prediction 




a) Polynomial Multiplication 
Let GF(N) denote a set of N elements, where N is a power of a prime 
number, with two special elements  0 and 1 representing the additive 
and multiplicative identities respectively and two operator addition 
‘+’ and multiplication ‘.’.  The GF(N) defines a finite field, if it forms 
a commutative ring with identity over these two operators in which 
every element has a multiplicative inverse. Finite fields can be 




 pixi, where pi∈GF(2) [9]. It is conventional to represent 
the elements of GF(2m) as a power of the primitive element α where 
α is the root of P(x), i.e. P(α) = 0. The set {1, α,…,αm-1} is referred 
to as polynomial basis or standard basis. Each element A∈GF(2m) 
can be expressed  with respect to the PB as a polynomial of degree m 




i aixi where ai∈  GF(2).  Given A, 
B∈GF(2m), PB multiplication over GF(2m) can be defined as C(x) = 
A(x).B(x) mod P(x). In practice C(x) is obtained in two steps: 
polynomial multiplication and modulo reduction. 
 
b) Dual Basis Multiplication 
Let Fpm denote the set of all linear function f:GF(pm)                 
GF(p). A well known linear function is the trace function which is 
frequently used to produce the finite field multipliers. Rather than 
trace function there are a number of other linear function. We follow 
the definition of the duality of two bases [16-17] as given below. 
Definition: Let {λi} and {μi} be bases for GF(2m), let 
f:GF(2m) GF(2) be a linear function and let β∈ GF(2m), β ≠ 0. 
Then the bases are said to be dual with respect to f and β if 







In this case {λi} is the standard basis and {μi} is the dual basis. We 
now restate the multiplication algorithm utilized here. This result was 
first presented in the context of division [16] but has subsequently 
if i=j 
if i j≠  
--------------------------------------------------------------------------------------------------------- 
This work was supported in part by Royal Society (UK) Grant. 
978-1-4244-2587-7/09/$25.00 ©2009 IEEE
Authorized licensed use limited to: CURTIN UNIVERSITY OF TECHNOLOGY. Downloaded on November 9, 2009 at 22:50 from IEEE Xplore.  Restrictions apply. 
been used to describe finite-field multiplication [15]. Furthermore, as 
observed in [1], the following represents a generalized and alternative 
representation of Berlekamp bit-serial multiplier. 
Theorem 1 [14]: Let a, b, c ∈ GF(pm) such that c = ab. Further, let α 
be a root of the defining irreducible polynomial for the field, let β ∈ 
GF(2m), f ∈ F2m and represent c over the polynomial basis by a 
= ∑ −= 10mi aiαi,  


















































































       (1) 


































































bbb                           (2) 
 
Where bk = f(bβα k) (k = 0, 1, ... 2m - 2) and ck = f(cβα k) (k = 0, 1, ... 
m - 1). If f and β are taken as in the preceding definition, ck and bk, (k 
= 0, 1, ... m - 1) in eqn. 1 are the dual-basis coefficients of c and b, 
respectively. Thus to make use of eqn. 1 in a systolic multiplier one 
must first generate the values of bk (k = m, m + 1, … 2m - 2).  





































β pjα j+k))  






















pjbj+k                          (3)                                                                                                                                                
where bk (k = 0, 1, ..., m - 1) are the dual basis coefficients of b and α 
is root of p(x). Having generated these values of bk from eqn. 2 one 
need to carry out the matrix multiplication given in eqn. 1. Now 
consider the implementation of this multiplication algorithm in the 
design of a bit-parallel systolic multiplier.  
 
III  BIT PARALLEL DUAL BASIS MULTIPLIER 
 
a) Proposed Architecture 
Let a, b, c ∈ GF(2m) such that c = a b and let {μi}be the dual basis to 
the polynomial basis for β∈ GF(2m) and f∈F2m. Representing ‘b’ over 
the dual basis by b = ∑ −= 10mi biμI and ‘a’ over the polynomial basis 
by a = ∑ −= 10mi aiαi, . We can derive followings from eqn. (2).  c0 = 
b0a0 + b1a1 + . . . + bm-1am-1 ; c1 = bla0 + b2a1 + . . . + bmam-1; ... ... ...;  
cm-1 =bm-1a0 + bma1 + . . . + b2m-2 am-1 
where bm+k (k ≥ 0) are given by eqn.3. From these equations it can be 
seen that m product bits are generated by m identical functions of the 
form; 
h(b, a) = bka0 + bk+1a1 + . . . + bk+m-1 am-1   … (4) 
                all that changes in these functions is the value of k. 
 
A bit-parallel dual basis multiplier over GF(2m) can, therefore, be 
constructed using two cells. We introduce cell-1 as shown in Fig. 2 to 
generate eqn. (3) and also introduce a cell-2 for generating eqn. (2) as 
shown in Fig. 1. An example of such a multiplier over GF(24) is 
given below. 
Example 1: Let p(x) = x4 + x + 1 be the defining irreducible 
polynomial and let ‘a’ be a root of p(x). From eqn. (4), we can write 
as follows:     h(b, a) = bka0 + bk+1a1 + bk+2 a2  +  bk+3 a3   … (5) 
 
This equation can be implemented by the circuit as shown in Fig. 2. 
From p(x) = x4 + x + 1 and eqns. (3) and (4), we can derive the values 
of b4, b5, b6 as follows: b4 =b1+ b0; b5 = b2 + b1; b6 = b3 + b2   






































The m2 cells of Fig 1 and m cells of Fig 2 are then combined to form 
the full bit-parallel dual basis multiplier for GF(24) as shown in Fig 3. 
If  b = ∑ −= 10mi biμi is the dual basis representation of b and a 
= ∑ −= 10mi aiαi, is the polynomial basis representation of a. The 
product bits ci (i = 0,1,2,3) become available on the output lines. In 
the architecture b4, b5, b6, is generated by the block diagram of 








pjbj+k , where k = 0, 1, ..., m - 2]. The partial sum in the 
matrix multiplication in eqn. (1) is generated by the block 
diagram of Fig. 1. 
In BP Systolic dual basis multiplier design of [14], there exist two 
datapath, one is horizontal and the other is vertical. The vertical 
datapath generates partial sum in matrix multiplication of eqn.(1). 
The horizontal data path generates partial sum of eqn.(2). There is a 
b0   b1    b2  b3     a0          c0 
b1   b2    b3   b4    a 1          c1 
b2    b3   b4   b5    a2      =  c2 
b3   b4    b5   b6    a3          c3 
p0    bi       p1   b i+1    p 2   b i+2      p3     b i+3  
bi+4      ccout 
 bin                          ccin 
ain                                                      aout 
Fig. 1 Generation of  Partial 
Products of eqn. 1 
Fig.2: Generation of the sum of 





b0  0 b1  0 b2  0   b3   0 
 c0  c1 
  c2    c3 b7 
  
b4 
  b5 
  
b6 
p0 b 0  p1 b1  p 2  b2  p3 b3 
Fig.3: Arrangement of systolic cells for bit-parallel multiplier for GF(24).
Authorized licensed use limited to: CURTIN UNIVERSITY OF TECHNOLOGY. Downloaded on November 9, 2009 at 22:50 from IEEE Xplore.  Restrictions apply. 
bottleneck to support pipelining in this design. The horizontal data 
path consists of AND-XOR binary tree, the depth of tree is O(m). We 
try to modify the horizontal data path by replacing the binary tree of 
depth O(m) with a binary tree of depth of O(log2m). For this purpose, 
we introduce a new cell [Fig. 2] to generate the eqn. (2). The 
complete circuit for dual basis systolic multiplier over GF(24) is 
shown in Fig. 3. Latches are introduced in Fig. 3, to make this 
architecture suitable for pipelining. There is m- clock cycle delay 
between ‘b’, ‘c’ entering in the multiplier and becoming available in 
the output lines. After the initial delay, results can be produced 
continuously one per clock cycle. 
 
b) Hardware and Delay Analysis 
We compare our proposed architecture with the bit parallel 
architecture described in [16]. Total hardware required for the 
architecture presented consists of m2 cells. Each cell consists of two 2 
input AND gates and two 2 input EXOR gates. Total circuit consists 
of 2m2 AND gates and 2m2 EXOR gates. Our proposed design 
requires 2 cells. First cell consists of one AND gate and one EXOR 
gate. Second cell consists of m AND gates and (m-1) EXOR gates. 
For   m bit multipliers, the proposed architecture consists of m2 first 
cells and m second cells. Total 2m2 AND gates and (2m2–m) EXOR 
gates are required. Overall saving in hardware is m EXOR gate. 
 
Table 1: Comparison between two bit-parallel systolic multipliers 
Properties Reference [19] Presented here 
Number of cells m2 cell 1:m2   &  cell 2:m 











No of 2 
input XOR 
gate 
2m2 2m2 - m 
Largest delay path (2m-1) [DA  + 
DX] 
m DA +(log2m+m-1) 
DX 
 
Let DA be the delay through a two-input AND gate and Dx be the 
delay through a two-input XOR gate. The longest delay path is given 
in the eqn. (6). Longest Delay = [{DA(m+1-1)+DX(log2m+m-1)] 
                        = {m DA +(log2m+m-1) DX}           …(6)  
BP multiplier of [16] has a longest delay path of (2m-1) [DA+ DX], 
whereas the proposed multiplier has a longest delay path of {m DA+ 
(log2m+m-1) DX}. Hence, the proposed dual basis BP multiplier is 
hardware efficient and faster.  
 
Table 2: Hardware requirements and delays of dual basis Bit parallel 
multiplier (DPM) presented in [16] and the proposed multiplier (DPM) 
 DPM [16] DPM [PROPOSED] 
m AND XOR Delay AND XOR delay 
2 8 8 3[DA +DX] 8 6    2DA+2 DX 
3 18 18 5[DA +DX] 18 15 3DA+3.58DX 
4 32 32 7[DA +DX] 32 28 4DA+5DX 
5 50 50 9[DA +DX] 50 45 5DA+6.32DX 
6 72 72 11[DA +DX] 72 66 6DA+7.58DX 
7 98 98 13[DA +DX] 98 91 7DA+8.81DX 
8 128 128 15[DA +DX] 128 120 8DA+10DX 
9 162 162 17[DA +DX] 162 153 9DA+11.17DX 
10 200 200 19[DA +DX] 200 190 10DA+12.32DX 
 
From the table we can conclude that in this architecture, the 
number of AND gates are same compared to previous 
architecture [19], but for m-bit dual basis systolic multiplier m 
no. of XOR gates are less required in this architecture as well 
as the longest path delay of this architecture is also reduced by 
m-bit for AND gates and   for XOR gates delay is reduced by 
log2m instead of m.   
In Table 2, the hardware complexity and delays of the DPM [19] 
and the our proposed DPM architecture are given for GF(2m) for 
(m = 2, 3, . . .,10). From Table 1, it can be seen that for every 
case, the hardware complexity and delays of our proposed 
DPM architecture are less compared to those of the DPM 
architecture [19]. 
 
IV. Error Detection Using Parity Checking  
We use error-detection scheme with a very high probability of 
detecting faults in the bit-parallel systolic multiplication over GF (2m) 
using dual base with some additional outputs, called the check-bits as 
shown in Fig. 4. We assume that no interconnections or buses have 
any fault and each test phase with the test-circuits is separately 
controllable. At first, we attach parity-bits to the input elements: bp 
and ap and multiplying (AND) the inputs we have, 
bp = b0 ⊕ b1 ⊕  b2 ⊕ b3  ,    ap = a0 ⊕  a1 ⊕  a2 ⊕  a3 
bp. ap = (b0 ⊕ b1 ⊕  b2 ⊕ b3).( a0 ⊕  a1 ⊕  a2 ⊕  a3) = (b0a0 
⊕ b0a1 ⊕ b0a2 ⊕ b0a3) ⊕ (b1a0 ⊕ b1a1 ⊕  b1a2 ⊕  b1a3) ⊕  ( b2a0 
⊕  b2a1 ⊕ b2a2 ⊕ b2a3) ⊕ (b3 a0 ⊕ b3 a1 ⊕ b3a2 ⊕ b3a 3). 
 
Now, from eqn. (2) of the previous architecture, we get  
c0 = b0 a 0 ⊕  b1a 1 ⊕  b2a 2 ⊕ b3a 3 ; c1=b1a0 ⊕ b2a1 ⊕  b3a2 ⊕  b4a3  
;c2=b2a 0 ⊕ b3a 1 ⊕ b4a2 ⊕ b5a3; c3= b3a 0 ⊕  b4a 1 ⊕ b5a 2 ⊕  b6 a 3    
 
Now, we denote the modulo2 addition of these outputs of the 
multiplier by,  r = c0 ⊕  c1 ⊕  c2 ⊕ c3. 
Here, we add some extra lines and gates for the testing purposes 
which constitute the feedback lines yi.  Lines b0, b1, b2, b 3 and some 
XOR and AND gates are used to produce the circuit suitable for the 
testing. Some lines are used as feedback and are denoted by (y1, y2, 
y3, y4, y5, y6). So, some of the terms are eliminated when the bp. ap are 
added by modulo 2 addition to form the parity check in the output 
line with the feedback lines.  
The yi lines are given as: y1= b0a 1 ⊕ b0a2 ⊕ b0a3 ; y2= b1a 2 ⊕ b1 a3 
y3= b2a3;  y4= b4a 1 ⊕ b5 a 2 ⊕  b6 a 3   ;y5= b4 a 2 ⊕  b5 a 3 ; y6= b4 a 3 
 
The q line is derived from modulo addition of bp.cp and the yi lines.  
q = bp. ap ⊕ y1 ⊕ y 2 ⊕ y3 ⊕ y 4 ⊕ y5 ⊕ y = b0a0 ⊕  b0a1 ⊕  b0a2 
⊕ b0a3 ⊕ b1a0 ⊕ b1a1 ⊕ b1a2 ⊕ b1a3 ⊕ b2a0 ⊕ b2a1 ⊕ b2a2 ⊕ b2 
a3 ⊕ b3a0 ⊕ b3a1 ⊕ b3a2 ⊕ b3a3 ⊕  b0a1 ⊕ b0a2 ⊕ b0a3 ⊕  b1a2 
⊕  b1a3 ⊕ b2a3 ⊕ b4a1 ⊕  b5 a2 ⊕ b6 a3 ⊕ b4 a2 ⊕  b5 a3 ⊕  b4 a3. 
=b0a0 ⊕ b1a0 ⊕ b1a1 ⊕ b2a0 ⊕ b2a1 ⊕ b2a2 ⊕ b3a0 ⊕ b3a1 ⊕ b3a2
⊕ b3 a3 ⊕  b4 a1 ⊕ b4a2 ⊕ b4a3 ⊕ b5a2 ⊕ b5a3 ⊕ b6 a3. 
 
Now, rearranging, we see that q and r are same: 
q = b0a0 ⊕ b1a1 ⊕ b2a2 ⊕ b3a3 ⊕ b1a0 ⊕ b2a1 ⊕ b3a2 ⊕ b4a ⊕ b2a0 
⊕ b3a 1 ⊕ b4a2 ⊕ b5a3 ⊕ b3a0 ⊕ b4 a1 ⊕  b5 a2 ⊕  b6 a3    
 
A parity checking circuit is presented in the figure which is correctly 
functioning for the Bit-parallel systolic multiplication over GF (24) 
using dual base. If   the circuit operation is correct then q and r will 
agree and p = r ⊕ q = 0. If any cell in the circuit is faulty, that will 
change the output lines and that fault reflects in the r line, as q 
remains unaltered, so p=1 and the fault is detected. And if there is 
any failure in the yi line that can also be detected by p=1. Actually 
few of the yi terms cancel the output parity checking operation as 
because they appear an even number of times in the coefficient of the 
output  and cancelled out in the parity-checking operation. It can be 
improved further as the yi terms are the sum of the results of some of 
the individual cells.  So, if it is possible to temporarily disconnect 
those cells and connect with some lines to produce the desired 
feedback lines then the extra gates will not be required for the check 
Authorized licensed use limited to: CURTIN UNIVERSITY OF TECHNOLOGY. Downloaded on November 9, 2009 at 22:50 from IEEE Xplore.  Restrictions apply. 
line q. Then the circuit complexity will be reduced and less time will 
be required. 
 
DELAY: As the architecture is pipelined, so the path delays of each 
stage is same, except the last stage. The last has the maximum path 
delay. This can be calculated as for m-bit architecture: 
So, Td = 2mTXOR + TAND 
In our example in fig.1, we calculate the path delay as Td = 8TXOR + 
TAND  
a) Simulation Result 
We have modeled our proposed architecture in VHDL. The design 
was simulated in “Model Sim XE III 6.3c” and checked the 
functionality of the multiplier for different values of m. The physical 
synthesis and place and route are done using Magma design 
Automation EDA tools based on Austria Microsystems 0.35 micron 
technology. The post CTS-post detailed route layout of design for 
















Fig.5: Layout of Bit-parallel Dual Basis systolic Multiplier for GF (25) with 
Error Checking Circuit 
 
V. CONCLUSIONS 
The paper presented a fast dual-basis error tolerant bit-parallel 
systolic multiplier architecture over GF(2m), which can be pipelined 
and which requires less hardware compared to that required in the 
multiplier architecture proposed earlier. Our proposed multiplier can 
also operate over both the dual-base and polynomial base. The 
proposed multiplier provides shorter longest delay path compared to 
that provided by the architecture presented earlier. A simple and 
efficient error detection procedure using parity checking has been 






1. S.Kumar,T.Wollinger, and C.Paar, “Optimum Digit Serial GF(2m) 
Multipliers for Curve-based Cryptography”,TC, vol.55(10), pp.1306-
1311, 2006. 
2. S.T.J. Fenn, M. Benaissa, and D. Taylor, “Bit-Serial Dual Basis Systolic 
Multipliers for GF(2m)”, ISCAS 1995, vol.3, pp.2000-2003. 
3. C. K. Koc and B. Sunar, “Mastrovito Multiplier for all Trinomial”, IEEE 
Transactions on Computers, vol. 48, No.5, pp.522-527, May 1999. 
4. M. K. Hasan and V. K. Bhargava, “Division and bit-serial multiplication 
over GF(qm)”, IEE Proc. E, May 1992,139(3), pp.230-236. 
5. R. Furness, M. Benaissa and S.T.J Fenn, “Generalized Triangular Basis 
Multipliers for the Design of Reed-solomon Codes”, IEEE Workshop on 
Signal Processing Systems, 1997, pp.202-211. 
6. E. D. Mastrovito, “VLSI Architectures for Computation in Galois 
Fields”, PhD thesis, Linkoping Univ, Sweden, 1991. 
7. C. L. Wang and J. L. LIN, “Systolic Array Implementation of 
Multipliers for GF(2m)”, IEEE TCAS, 1991, Vol.38(7), pp 796-800. 
8. L.S. Reed and X.Chen, Error-Control Coding for Data Networks, 
Kluwer Academic, 1999. 
9. T.A. Gulliver, M. Serra, and V.K. Bhargava, “The Generation of 
Primitive Polynomials in GF(2m) with Independent Roots and Their 
Application for Power Residue Codes, VLSI Testing and Finite Field 
Multipliers Using Normal Bases,” Int’l J. Electronics, vol. 71, no. 4, pp. 
559-576, 1991. 
10. R.E. Blahut, Fast Algorithms for Digital Serial Processing.  Addison 
Wesley, 1985. 
11. Berlekamp, E.R.: ‘Bit-serial Reed-Solomon encoders’, IEEE Trans. Inf. 
Theory, 1982, 28, (6), pp. 869-874 
12. Yeh, C.S., Reed, I.S., and Truong, T.K.: ‘Systolic multi-pliers for finite 
fields GF (2m)’, IEEE TC., 1984, vol.33(4), pp. 357-360 
13. S.T. J. Fenn, M. Benaissa, and D.Taylor: “Dual basis systolic multipliers 
for GF(2m)”, IEE Comput. Digital. Tech.  Vol. 144, No.1, January 1997. 
14.  Fenn, S.T.J., Benaissa, M., and Taylor, D.: ‘GF (2m) multiplication and 
division over the dual basis’, IEEE TC., 1996, 45, (3), pp. 319-327. 
15. Fenn, S.T.J., Benaissa, M., and Taylor, D.: ‘Division in GF (2m)’, 
Electron. Letter, 1993, 28, pp. 2259-2261. 
16.  Wang, C.L., and Lin, J.L.: ‘Systolic array implementation of multiplier 
for finite fields GF (2m)’, IEEE TCAS-38(7), pp. 796-800, 1991. 
17. Fenn, S.T.J., Benaissa, M., and Taylor, D.: ‘GF (2m) multiplication and 
division over the dual basis’, IEEE TC, 1996, 45, (3), pp. 319-327. 
18. Hsu, I.S., Truong, T.K., Deutsch, L.J., and Reed, I.S,: ‘A comparison of 
VLSI architectures of finite field multipliers using dual, normal or 
standard bases’, IEEE TC,1988, Vol.37(6), pp.735-737. 
19. C.  H. Kim, C.P. Hong and S. Kwon, “A Digit-Serial Multiplier for 
Finite Field GF(2m)”, IEEE TVLSI, vol.13(4), pp.467-483, Apr. 2005. 
20. K. W. Kim, K. J. Lee and K. Y. Yoo, “A new digit-serial systolic 
multiplier for finite fields GF(2m)”, ICII 2001, Beijing, vo.l.5, pp.128-
133, Nov.2001. 
21. H. Rahaman, J. Mathew, and D. K. Pradhan, “C-testable bit Parallel 
























Fig 4: A parity checking circuit for the bit-parallel systolic multiplication over GF (24) using dual base. 
Authorized licensed use limited to: CURTIN UNIVERSITY OF TECHNOLOGY. Downloaded on November 9, 2009 at 22:50 from IEEE Xplore.  Restrictions apply. 
