Abstract-We present a new low-complexity bit-parallel canonical basis multiplier for the field GF(2 m ) generated by an all-onepolynomial. The proposed canonical basis multiplier requires m 2 -1 XOR gates and m 2 AND gates. We also extend this canonical basis multiplier to obtain a new bit-parallel normal basis multiplier.
INTRODUCTION
THE arithmetic operations in the Galois field GF(2 m ) have several applications in coding theory, computer algebra, and cryptography [6] , [4] . In these applications, time and area efficient algorithms and hardware structures are desired for addition, multiplication, squaring, and exponentiation operations. The performance of these operations is closely related to the representation of the field elements. An important advance in this area has been the introduction of the Massey-Omura algorithm [7] , which is based on the normal basis representation of the field elements. One advantage of the normal basis is that the squaring of an element is computed by a cyclic shift of the binary representation. Efficient algorithms for the multiplication operation in the canonical basis have also been proposed [5] , [3] . The space and time complexities of these bit-parallel canonical basis multipliers are much less than those of the Massey-Omura multiplier.
In this paper, we present an alternative design for multiplication in the canonical basis for the field GF(2 m ) generated by an allone-polynomial (AOP). The time complexity of our design is significantly less than similar bit-parallel multiplier designs for the canonical basis [5] , [3] , [1] . Furthermore, we use the proposed canonical basis multiplier to design a normal basis multiplier whose space and time complexities are nearly the same as those of the modified Massey-Omura multiplier [2] given for the field GF(2 m ) with an AOP. Nevertheless, the proposed normal basis multiplier is based on a different construction from the ones already known, and it has certain advantages.
CANONICAL BASIS MULTIPLIER
It is customary to view the field GF(2 m ) as an m-dimensional vector space defined over the ground field GF(2). We need a set of m linearly independent elements from GF(2 m ) in order to represent the elements of GF(2 m ). This set serves as the basis of the vector space. A basis of the form S = {1, a, a 2 , º, a
is a root of the generating polynomial of degree m, is called a canonical basis. In order to reduce the complexity of the field multiplication, special classes of irreducible polynomials have been suggested [3] , [5] . In particular, the AOP p(x) = 1 + 1 .
where the step function u(t) is defined as
The product C = AB is found by multiplying the matrix Z by the vector B in the ground field GF(2). The Mastrovito algorithm directly computes this product C = ZB.
We introduce a new canonical basis multiplication algorithm for the field GF(2 m ) generated using an AOP by decomposing the matrix Z into the matrices Z 1 and Z 2 as Z = Z 1 + Z 2 . The idea of decomposing a matrix has proven to be useful in many similar designs [2] . In order to construct these matrices, we first write the matrix equality (II) for the matrix Q in the field GF(2 m ) with an AOP using the identity x m+1 = 1 as 
Using the definition (2) of Q and the definition (1) of Z, we construct the product matrix Z for the field GF(2 m ) with an AOP as the sum of two matrices Z 1 and Z 2 , which are given as follows: 
In order to compute C = ZB = (Z 1 + Z 2 )B, we first compute D = Z 1 B and E = Z 2 B in parallel and, then, compute the result
The product of the last row of Z 1 and B is computed using the rightmost U circuit with two additional gates, which take care of the nonzero element of the last row of Z 1 . The architecture of the canonical basis multiplier is shown in Fig. 1 . The module which computes the vector D = Z 1 B consists of m identical U circuits, an AND, and an XOR gate. The circuit U computes the innerproduct of two vectors of length m -1. Since one element in each row of Z 1 is zero, except in the last row, the innerproduct operation needs to be of length m -1. The vector A is shifted according to the place of the zero element in each row of Z 1 , while the vector B is fed to the ith U module by skipping the ith bit. The connection diagram of the part of the multiplier computing D is shown in Fig. 2 . The basic rewiring modules used in the connection diagram are defined in Fig. 3 . Fig. 3 . The rewiring modules used in the connection diagram.
The structure of the module U is very simple. The innerproduct of two vectors is computed by, first, generating the products in parallel using AND gates and, then, by adding the partial products using a binary XOR tree. In order to generate the products m -1 AND gates are needed, whereas m -2 XOR gates are used to accumulate the products. The depth of the binary XOR tree is given as Èlog 2 (m -1)˘. The total delay of the circuit U is equal to T A + Èlog 2 (m -1)˘T X , where T A and T X are the delays of AND and XOR gates, respectively. The computation of d m-1 requires an additional XOR gate delay. Thus, the computation of D requires a total of T A + (1 + Èlog 2 (m -1)˘)T X delays.
In order to compute E = Z 2 B, we need a single U module with inputs according to the definition of Z 2 given above. Since Z 2 has identical rows, the computation of Z 2 B is accomplished by computing the innerproduct of a row of Z 2 and the vector B and, then, replicating this resulting bit m times, i.e., E = [e e e e], where e is repeated m times. After E = Z 2 B is computed, the result C = Z 1 B + Z 2 B = D + E is obtained using m XOR gates, as shown in the bottom part of Fig. 1 .
The proposed canonical basis multiplier architecture requires a total of (m - 
NORMAL BASIS MULTIPLIER
A basis of the form 
For further information, the reader is referred to [6, p. 99] . Since the set (3) is also a basis, it can be used to represent the elements of GF ( 
for .
In order to perform a normal basis multiplication, we take the inputs A and B represented in the normal basis, convert them to the shifted canonical basis using the permutation P, and then perform a canonical basis multiplication. At the end of this computation, we obtain F=AB/b 2 represented in the canonical basis as
Note that the values f i are the outputs of the canonical basis multiplier shown in Fig. 1 , and, therefore, we have f i = d i + e for i = 0, 1, º, m -1. We then multiply F by b 2 , and obtain G = Fb 2 as
.
We now need to represent this number in the shifted canonical basis. Since
is added to the coefficients of all the other terms. We can write the final expression as Thus, we have obtained the representation of the number G in the shifted canonical basis. We now apply the inverse of the permutation P to G and obtain the bits of the number C in the normal basis. The architecture of the normal basis multiplier is given in Fig. 4 . It is very similar to that of the canonical basis multiplier. The implementation of the permutation and inverse permutation operations are accomplished by wiring. Therefore, the normal basis multiplier requires exactly the same number AND and XOR gates as that of the canonical basis multiplier in Fig. 1 . Furthermore, the time complexity of the normal basis multiplier is equal to that of the canonical basis multiplier.
CONCLUSIONS
The time complexity of the proposed canonical basis multiplier is significantly less than previously proposed similar multipliers for the field GF(2 m ) generated by an AOP. [5] , [8] . The XOR and AND complexities of the Mastrovito multiplier for a general trinomial or an AOP are not known. However, the number of XOR gates for a general trinomial is conjectured to be ≥ m 2 -1 in [8] . The normal basis multiplier proposed here and the modified Massey-Omura multiplier [2] require the same number of XOR and AND gates, which is about half of the number of gates required by the Massey-Omura multiplier for the field GF(2 m ) with an AOP. The design proposed in this paper requires only one more XOR delay than the modified Massey-Omura multiplier. Nevertheless, it is an alternative design, and is based on an entirely different construction. Another advantage is that it is highly modular. Since the proposed normal basis multiplier is based on a canonical basis multiplier, any advances made in canonical basis multiplication using AOPs can be utilized in this design to further reduce the complexity or timing requirements.
