Efficient Testable Bit Parallel Multipliers over GF(2 m) with Constant Test set by The Pennsylvania State University CiteSeerX Archives
 
 
 
Efficient Testable Bit Parallel Multipliers over GF(2
m) with Constant Test set 
 
 
J. Mathew, H. Rahaman and D. K. Pradhan 
Department of Computer Science, University of Bristol, Bristol BS8 1UB, UK 
Email: {hafizur, jimson, pradhan@cs.bris.ac.uk} 
 
Abstract – We present a C-testable method for detecting stuck-at (s-a) 
faults in the polynomial basis (PB) bit parallel multiplier circuits over 
GF(2
m). It requires only 7 tests for detecting faults to provide 100% fault 
coverage, which is independent of the multiplier size. These 7 tests can be 
derived directly without any requirement of ATPG tools. Synopsys® tool is 
used to generate ATPG based test patterns.    
 
I. INTRODUCTION 
Finite/Galois field arithmetic operations over GF(2
m) have gained wide 
spread applications in public-key cryptography, error correcting code, VLSI 
testing, digital signal processing. Two basic arithmetic operations over 
GF(2
m) are addition and multiplication. While addition over GF(2
m) can be 
implemented with m  2-input EXOR gates, multiplication is much more 
complex. The exponentiation, division, and inversion over GF(2
m)  can be 
performed by repeated multiplications. Various techniques exist for optimal 
design of multipliers over GF(2
m) w.r.t. complexities, delay, and power. 
Most techniques focused on VLSI implementation and synthesis of these 
multipliers because VLSI implementations of these circuits are very 
complicated due to complex routing, non-modular architecture and low 
testability. Testability is an increasing concern in VLSI design. Here, a C-
testable design of PB multipliers over GF(2
m)  is proposed. This design 
requires only 7 vectors to detect stuck-at faults in the multiplier circuits. We 
have observed that this figure is lower than that generated by the industrial 
ATPG tools such as Synopsys® tools. 
 
II. PRELIMINARIES 
Let  GF(N) denote a set of N elements, where N is a power of a prime 
number, with two special elements 0 and 1 representing the additive and 
multiplicative identities respectively, and two operators addition ‘+’ and 
multiplication ‘.’. GF(N) defines a finite field, if it forms a commutative ring 
with identity over these two operators in which every element has a 
multiplicative inverse. Finite fields can be generated with irreducible 
polynomials of the form 
i m
i i
m x c x x p ∑ =
− + = 0
1 ) ( , where  ) 2 ( GF ci ∈  
[1]. It is conventional to represent the elements of GF(2
m) as a power of the 
primitive element  α where α is the root of p(x), i.e. p(α)=0. The set 
{1,α,…,α
m-1} is referred to as the polynomial basis. Each element A∈GF 
(2
m) can be expressed with respect to the PB as a polynomial of degree m 
over GF(2), i.e.  ∑
−
= =
1
0 ) (
m
i
i
ix a x A  where ai∈ GF(2). Given A,B∈ GF(2
m), 
the PB multiplication over GF(2
m) can be defined as C(x)=A(x).B(x)mod 
p(x). Details can be found in [1, 7]. Several works for design of PB 
multiplier over GF(2
m) have been appeared in [2-6]. An algorithm along 
with its hardware architecture for PB multiplication known as the Mastrovito 
algorithm/multiplier was proposed in [2]. Based on this algorithm, a 
formulation for PB bit-parallel multiplication architecture for special 
reduction polynomials, namely:  trinomials, equally spaced polynomials 
(ESPs), and two classes of pentanomials was presented in [3]. A testable 
design with (2m+4) test vectors was reported in [9]. Consider a multiplier 
with A and B inputs where A = [a0, a1, a2,…am-1] and B =[b0,b1,b2,…,bm-1]. 
The ai, and bi are the coordinates of A and B respectively where 0≤ i ≤ m-1.  
 
 
 
 
 
 
 
 
The basic architecture of PB multiplier over GF(2
m) is shown in Fig. 1. This 
structure consists of two parts: the Inner Product (IP)-network and Q-
network. The multiplication outputs are given by the equation: c= d + Q
Te, 
where (i) reduction matrix Q is {(m-1) x m} obtained in [2], (ii) d = L*b, (iii) 
e = U*b. L is an {m x m} lower triangular matrix and U is an {(m-1) x m} 
upper triangular matrix. L and U are derived in [9]. In this paper, a C-
testable scheme for detecting s-a faults in the multipliers over GF(2
m), which 
requires only 7 tests for detecting s-a faults in the multiplier circuits at the 
cost of at most 2m
2/3 additional 2-input EXOR gates and two control 
inputs. A constant number of the tests independent of their size are called 
C-testable. The test set is derived readily without the aid of any ATPG.  
Example 1: A multiplier structure considering over GF(16) constructed 
by the irreducible polynomial P(x) = x
4 + x
3 +1 is shown in Fig. 2.  
0 2 e e −
0 3 b b −
0 3 a a −
3 c 2 c
1 c
0 c  
 
 
III. PROPOSED TECHNIQUE 
The multiplier is a multi-out PPRM (Positive Polarity Reed-Muller) 
circuit. The concept of testable technique of single output PPRM circuits 
presented in [8] is modified in this paper. The EXOR part of IP network 
has been augmented with some additional EXOR gates and 2 control 
lines k1 and k2 as shown in Fig.3.  
The EXOR-tree part of single output AND-EXOR circuit can be tested 
for all single s-t faults by four (2m+2)-bit constant tests applied to the 
inputs of the circuits. This scheme will allow each of the 4 combinations 
{00, 01, 10, 11} to be applied at the inputs of each 2-input EXOR gate in 
the tree. In Fig. 3a the inputs to the last EXOR gate require {00, 11, 
01,10} to generate the output sequence s: 0011. Thus, its two inputs 
should receive sequence q: 0110 and r: 0101. Similarly q: 0110 and s: 
0011 arriving at two inputs of an EXOR gate will generate output 
sequence  r: 0101. Again, input sequences r: 0101 and q: 0110 will 
generate s: 0011 as output. There exist the relations among the vectors  
(q, r, s) as: q ⊕ r = s, q ⊕ s = r, r ⊕ s = q. Hence by applying the two of 
the three sequences (q, r, s) to the inputs of each EXOR gate in a tree, the 
4 input combinations {00, 01, 10, 11} can be applied to the inputs of 
each EXOR gate of the tree. In the EXOR tree of Fig. 3a we assign 
sequences q, r, s, q, r, s, q, r,... by repeating the pattern (q, r, s) to the 
inputs of the EXOR tree from left-to-right until all of them are assigned. 
The outputs of first level are propagated down to the root (final output of 
the tree). Thus each EXOR gate in the tree receives the desired 4 
combinations {00, 01, 10, 11} as inputs. The 4 constant tests that are to 
be applied to the inputs of the tree are shown as a matrix Ttree. This 
matrix has four (constant) rows and y columns where y is the number of 
leaf nodes in the tree, which in our case is equal to the number of AND 
outputs (m
2) in the multiplier circuits. The columns of the matrix, if seen 
from left-to-right, will correspond to the sequences: q, r, s, q, r, s, q, r, 
and so on. The number of distinct columns in the matrix is only three 
(constant) regardless of the size of the tree.  
Since the EXOR-tree is embedded in overall design, the inputs of the 
tree are not directly accessible. Each AND output feeds an EXOR input. 
Hence by applying a test {a0  a 1  … am-1  b0  b 1  … bm-1} = t1: {0,0, …, 
0,0,0,…,0} (t2: {1,1, …,1,1,1, …,1}) to the primary inputs of AND 
gates, an all-zero (all-one) vector can be produced at the outputs of   
AND-part.  Therefore if a test sequence t1, t2, t1, t2 is applied, then all the 
EXOR gates at first level of the tree will receive the sequence q: 0110. 
To generate the other two sequences r: 0101 and s: 0011 we use a 
control level with a few additional EXOR gates and 2 control inputs k1 
and k2 (Fig. 3b). The original functions can be obtained by setting these 
control inputs to 0. In this design the AND outputs are partitioned into 3 
groups based on sequences q, r, s. The output lines of 1
st group receiving 
sequence q: 0110 are allowed to pass directly through the control level to 
reach inputs of the EXOR tree. The AND outputs for other two groups (r 
and s) are passed through an additional EXOR gate each using control 
input  k1 and k2  respectively. By controlling the values of k1(k2) the 
Q-network 
Inner Product 
Generation 
(IP) 
e0 
 
 
em-3 
 
d0 
 
dm 
c0 
 
 
cm-1 
A      m 
 
 
 
B        m 
Fig. 1: Basic Architecture of Galois Multiplier over GF(2m) 
Fig.2: Architecture of the BP multiplier over GF(24)  with P(x) = x4+ x3+1
13th IEEE International On-Line Testing Symposium (IOLTS 2007)
0-7695-2918-6/07 $25.00  © 2007 
 
 
desired sequence vector r (s) can be obtained from q. The r and s are derived 
from following relations: q ⊕ k1 = r, q⊕ k2 = s. 
 
 
 
G C E H F D
q r s q r s q r
s s
s
r
r q
q
A B
) ( 1 a group
st ) ( 2 b group
nd ) ( 3 c group
rd level control
1 k
2 k
A
B
D G
E H
C F
plane AND
tree EXOR
F ) (a ) (b  
 
The PB multiplier is a multi- output PPRM circuit: The above technique is 
applicable to single output AND-EXOR circuit. This concept has been 
extended to the multi-output multiplier circuits. Testability is achieved by 
proper augmentation of IP and Q-network based on the following steps. 
1. First, assign em-2 = q, em-3 = r, em-4 = s, em-5 = q, em-6 = r, em-7 = s, em-8 = q, 
… It is assumed that ej ( 0  ≤ j ≤ m-2)  of IP-network (from left) will 
generate sequences q, r, s, q, r, s, q,… 
2. Then map di  (0 ≤ i ≤ m-1) with proper sequence in such way that no two 
inputs of each EXOR gate get the same sequence in the Q-network. 
3. Assign inputs of EXOR gates in IP-network with proper sequences so that 
no two inputs of each EXOR gate of IP-network receive the same 
sequence. 
0 2 e e −
0 3 b b −
0 3 a a −
3 c 2 c 1 c 0 c
0 d
1 d 2 d
3 d 0 e 1 e 2 e
2 1 k k −
 
We assume that IP-network would generate sequences from the left: q, r, s, 
q,…,q, r, s, q and so on at ej (0 ≤ j ≤ m-2).  To propagate ej’s at Q-network 
outputs, the di‘s (0 ≤ i ≤ m-1) will be properly mapped with sequences: q, r, 
s. After assigning all the root nodes of IP-network, di, and ej inputs of each 
EXOR gate in IP-network will be activated from AND outputs with proper 
sequences so that no two inputs of each EXOR gate receive the same 
sequences. This concept is described in the example 3.  
Example 3: The testable design of a multiplier over GF(2
4) with P(x) = x
4+ 
x
3+1 of Fig. 2 is shown in Fig. 4. Here, IP-network has three e‘s and four d’s 
outputs. Boolean expressions derived for multiplication outputs ci’s (0 ≤  i ≤ 
3) from d, e and Q matrix are given as c0 = d0 + e0+ e1 + e2; c1 = d1 + e1 + e2 ; 
c2 = d2 + e2; c3 = d3 +e0+ e1 + e2. First, we assign ej (0≤ j ≤ 2) as e2 = q, e1 = 
r,  e0 = s. Using step 2, we derive: d0=q or r, d1=q or r, d2=r or s, d3=q or r.  
After assigning ej  (0≤ j ≤ 2) and di ( 0≤ i ≤ 3), every EXOR gate in IP-
network is mapped. If a test sequence t1, t2, t1,  t2 is applied, then all the 
EXOR gates at the first level of EXOR tree of IP-network will receive 
sequence q: 0110. To generate other two sequences r: 0101 and s: 0011, we 
use a control level with a few additional EXOR gates and two control inputs 
k1 and k2 as shown in Fig. 4. The input mapping of EXOR gate in IP-network 
is shown in Fig.4.    
Constant test set for detecting s-a faults: If we put k1 = k2= 0, then the output 
functions are the same as the original functions.  
Testing of EXOR-parts of IP and Q networks: All single s-a faults can be 
tested by applying just four vectors at the inputs of Fig. 4, and by observing 
the circuit responses at the functional outputs.   
Lemma 1: Any single s-a fault in EXOR-part of the network is tested by T1.  
Proof: Follows from  Ttree, and above discussions.       
 
 
 
 
AND-part, primary inputs, Control inputs: The multiplier network is 
multi-output PPRM network. To test a s-a-1 fault at an input line of an 
AND gate, we set it to 0 and set other line of this gate to 1. To test a s-a-
0 fault, we set all the inputs to 1. Hence Lemma 2 follows.    
Lemma 2: All single s-a faults in the AND-part, at the primary inputs, 
are testable at the ci’s output by test set T2 of length 3.   
Proof: AND part is tested for single s-a-1 faults by the first two vectors 
of  T2. Due to first vector all the AND gates receive the (0, 1) 
combination at the inputs, and for the second vector all the AND gates 
receive the (1, 0) combination. Any single stuck-at-1 fault at any input 
and output of any of AND gates will propagate to the functional outputs. 
The third vector, i.e. the all 1 vector detects any single s-0 fault at the 
input and output of any one of the AND gates of the IP-network.  The 
complete test set T2 detects all the single s-a faults at the primary inputs, 
inputs and outputs of the AND gates of IP-network.    
Lemma 3: Any single stuck-at fault at k1 and k2 inputs is testable at the 
ci’s outputs of multiplier circuit by T1.   
Proof: First vector of T1 detects any s-a-1 fault at the control inputs and 
last vector of T1 detects any single s-a-0 fault at control inputs.                                          
Theorem 1: Any single s-a fault in the proposed Multiplier network is 
testable by the constant test set T of length 7, where T = T1 ∪ T2. 
Proof: Follows from Lemmas 1, 2 and 3.                    
 
IV. EXPERIMENTAL RESULTS 
Table 1 gives the number of tests obtained from different schemes for 
single faults to achieve 100% fault coverage. As our test set is constant, 
this scheme eliminates the need for test generation programs. Synopsys® 
tool is used to generate ATPG based test patterns.  Clearly from the table 
1 both ATPG-based test generation and algorithmic test generation 
schemes require more test patterns compared to proposed scheme.   
 
Table 1.  Number of tests required for achieving 100% Fault Coverage 
Bit-width (m)  ATPG based   Algorithmic  
Test Set 
Technique 
of [20] 
Proposed 
C-testable Tech. 
4 12  14  12 7 
5  14  17 14  7 
6  18  20 16  7 
7  20  22 18  7 
8  23  24 20  7 
9  26  27 22  7 
16  38  39 36  7 
   
V. CONCLUSION 
An easily C-testable scheme of bit parallel PB multipliers over GF(2
m) is 
proposed. For a m-bit multiplier circuit, a constant test set of length 7 
which can be determined readily without any need of an ATPG tool, is 
sufficient to detect all the single stuck-at faults. The test set being very 
short in length, reduces test application time and test power.  
 
REFERENCES 
1.  R. Lidl and H. Niederreiter, Introduction to Finite Fields and Their 
Applications. Cambridge Univ. Press, 1994. 
2.  E.D. Mastrovito, “VLSI Architectures for Computation in Galois 
Fields,” PhD thesis, Linkoping Univ., Linkoping, Sweden, 1991. 
3.  A. Reyhani-Masoleh, and M. A. Hasan, "Low Complexity Bit 
Parallel Architectures for Polynomial Basis Multiplication over 
GF(2
m)", IEEE Trans. Computers, Vol.53, No.8, pp.945-959, 2004.  
4.  E.D. Mastrovito, “VLSI Architectures for Computation in Galois 
Fields,” PhD thesis, Linkoping Univ., Linkoping, Sweden, 1991. 
5.  B. Sunar and C.K. Koc, “Mastrovito Multiplier for All Trinomials,” 
IEEE Trans. Computers, vol. 48, no. 5, pp. 522-527, May 1999. 
6.  A. Halbutogullari and C.K. Koc, “Mastrovito Multiplier for General 
Irreducible Polynomials,” IEEE Trans. Computers, vol. 49, no. 5, 
pp. 503-518, May 2000. 
7.  D. K. Pradhan, “A Theory of Galois Switching Functions”, IEEE 
Trans. Computers, vol. 27, no. 3, pp.239-248, Mar. 1978. 
8.  H. Rahaman, D. K. Das, and B. B. Bhattacharya, “Testable design 
of GRM network with EXOR-tree for detecting stuck-at and 
bridging faults,” ASPDAC 2004, pp. 224-229. 
9.  H. Rahaman, J. Mathew, A. M. Jabir
 and D. K. Pradhan, “Easily 
Testable Implementation for Bit Parallel Multipliers in GF (2
m)”, 
HLDVT 2006, , pp.48-54. 
T1=
k1 k2   a0 a1…am-1 b0 b1 … bm-1
 0   0   0  0…0      0   0 …0 
 0   1   1  1…1      1   1 …1 
 1   0   1  1…1      1   1… 1  
 1   1   0  0…0      0   0 …0  
        
         k1  k2   a0   a1… am-1 b0 b1… bm-1
              0    0    0    0 … 0     1   1…1 
    T2 =    0    0    1    1 … 1     0   0…0  
               0    0    1    1 … 1     1   1…1  
 
Fig. 3. (a) Responses in an EXOR-tree   (b) EXOR-tree with a control level
 
s 
               A  B  C  D   E  F  G   H 
            0   0  0   0    0   0  0   0 
Ttree  =  1   1  0  1     1   0  1   1 
            1   0  1   1    0   1  1   0 
             0  1  1   0    1   1  0   1 
          
    q                   r 
Fig. 4: Testable Design of GF(16) Multiplier 
13th IEEE International On-Line Testing Symposium (IOLTS 2007)
0-7695-2918-6/07 $25.00  © 2007