Abstract The Massey-Omura multiplier is a well-known sequential multiplier over finite fields GF(2m), which can perform multiplication in m clock cycles for the normal basis. In this article, we propose a new architecture to carry out the sequential multiplier using normal basis. The time complexity in each cycle of this new multiplier is 0(1) and the number of inputs is lower down to m+1, instead of 2m, which is the number needed for most previous multipliers. These merits make it easier to the VLSI implementation.
INTRODUCTION
The designs of the finite field arithmetic units are mainly applied to the cryptography and error correcting codes [1] .
These units include adder, multiplier, power, exponentiation, and multiplicative inversion. Among these, the multiplier is critical in terms of the complexity of the time and space. All other complicate units listed above are functionally equivalent to modules that consist of multipliers and power units only. According to our recent research on the Quadratic Residue (QR) codes [2] , the dimension of the finite field GF(2m) is quite high, i.e. the value of m is quite large. It is necessary to have a way to get a minimum core for those space-limited applications with large m, such as the embedded system for communication. As a result, to have a fast architecture and low complexity is crucial on the designs of multiplier and power units.
The complexity of space and time for above units depends heavily on the way how the field elements are represented. An element of GF(2') is usually represented with respect to one of the three popular bases: polynomial basis, dual basis, and normal basis. By using normal basis, to square an element of GF(2') can be easily achieved by a cyclic shift or re-wiring without any extra gate; therefore the problem to implement the power units can be solved. Basically, the hardware designs of normal basis multiplier can be classified into two types: parallel-type [3] [4] [5] and sequential-type [6] [7] [8] . This article aims specifically at the fast Yaotsu Chang Department of Applied Mathematics I-Shou University Kaohsiung, Taiwan, R.O.C. ytchang -isu.edu.tw architecture of the multiplier of sequential-type with a minimum core and delay for the normal bases.
A normal basis multiplier of sequential-type was invented by Massey and Omura [6] . Later on, Wang et al. [3] developed a pipelined version of the Massey-Omura multiplier. The original Massey-Omura multiplier needs m clock cycles for a complete multiplication. As m increases, the time delay also increases; more precisely, the cycle time is proportional to the value of log1m. A new construction of a sequential multiplier with the same cycle counts is proposed in this paper. The time complexity of the new architecture is 0(1) in each cycle. Besides, we consider that the more inputs to the core function will require the more metal layers of VLSI implementation of multipliers. It is well-know that more metal layers will cause a lot of difficulties in the step of the place and route. The new design is lower down the inputs and this merit is on the simplicity of the physical hardware design.
II.
PRELIMINARIES

A. Normal Basis Representation
For any positive integer m, the normal bases exist for the finite field GF(2m) over its base field GF(2) [1] ; i.e. there exists an element feGF(2m) so that {fi, 2, ..., 2 } forms a basis over GF (2) . This basis is called a normal basis of GF(2m) over GF (2) . Using this basis, any element A c GF(2') can be expressed as A = ZImJ afi,i = a0,/ + a/f + + am-fi (1) where ai c GF (2) (2) In vector form, one has A2 = [a-,, ao, ., a-2-], which is just a simple right cyclic shift of A= [ao, a , .., a, I ] That is, A2 can be obtained by a right cyclic shift of the coefficients of A. According to (1) and (2), we have the following
where / denotes the remainder of I when divided by m. Therefore, the A2 is obtained by I times right cyclic shift of A.
B. Normal Basis Multiplication
In 1982, Massey and Omura invented a new computational method for finite field multiplication of the normal basis. The multiplier can be performed by using the same logic function to compute each component of the product of the two elements of GF(2m) [6] . Let B, C, and D be three elements of GF(2m) in the normal basis representation, and D is a product of the other two elements, or D = B x C. From [4] , the last component dm 1 of D is a logic function of the components of B and C, i.e. dm-l = f(bo , b1, n..~bm-1; co , c1, , cm-,) (4) Since a squaring is a right cyclic shift of an element in the normal basis representation, one has lY=B2 xC2, and then 
where 0 < k < m -1, and B(k) is the k-fold right cyclic shift of B [5] . The Messey-Omura multiplier is a sequential multiplier. The hardware block diagram is shown in Fig. 1 .
III. NEW SEQUENTIAL MULTIPLIER
A. Formulation From [9] , the matrix M of multiplier on the normal basis, can be used as a reference for the explanation for details. Here, using [9] , the elements of matrix M are defined as follows:
,322
where 0 < i, j < m -1 , and Qijy k E GF(2). The following property of 0, j k is applied. Lemma 1. For 0 < i,j,k < m -1, )i,j,k Xi-l j-I,k--l = l,-j+I,k+l
Proof. We can take squaring for / times on both sides of (9) so that Step 1: For I = 0 to m -1{
Step 2: D D2 + Fo (B, C)
Step3: B B2,C= C2} In this algorithm, we use a fixed function Fo(B,C):
Fo (B, C) = Eml0mE 0, bi m C(m_J)'82
As a result, the hardware of the sequential multiplier can be implemented by the same logic function with m cycles. Also, Cm in FO(B,C) is one fixed signal without any relevance to i andj. The architecture is discussed in the next part.
B. Architecture
The structure of the proposed algorithm can be realized as the one shown in Fig. 2 . This algorithm is recursive for m cycles. The core is a fixed function, F, which is a combinational logic circuit to implement Fo(B,C). The inputs to F are B and C, which are two shift registers and contain the components bk's and Ck's respectively. The output of F is stored into registers of dk's with all dk's= 0 initially. In each iteration of the multiplication operation, Cm is the only AND gate delay and an XOR gate delay, respectively. In this proposal, the number of AND and XOR gates can be easily obtained as m and CN respectively. The delay of critical path depending on D2 + Fo (B, C) is TA + (1 + F1og2 c7)T, For optimal normal bases, c = 2, and the delay of critical path would be TA + 2TX The comparison between the proposed multiplier and others [7, 8] on the normal basis is presented in Table 1 and Table 2 .
In Table 2 and Table 3 , the analysis of design of both Agnew et al. and proposed multipliers are the same in gate complexities and delay of critical path; however, the factor of VLSI design in the place and route should be analyzed. The proposed multiplier uses only one wire out of C which Agnew et al. has two or more wires. This provides an advantage on the design of low-resistant wiring in metal layer of VLSI. Since the speed on wire is taking account in current nano-VLSI design, the proposed multiplier can be faster due to the usage of its single Ck.
Multiplier
Massey-Omura [6] Agnew et al. [7] A. Reyhani-Masoleh et al. 1 [8] A. Reyhani-Masoleh et al. 2 [8] Proposed Multiplier Massey-Omura [6] Agnew et al. [7] A. ReyhaniMasoleh et al. 1 [8] A. ReyhaniMasoleh et al. 2 [8] Proposed
Delay(general) 
