Abstract. In many of cryptographic schemes, the most time consuming basic arithmetic operation is the finite field multiplication and its hardware implementation may require millions of logic gates. It is a complex and costly task to develop such large finite field multipliers which will always yield error free outputs. In this effect, this paper considers fault tolerant multiplication in finite fields. It deals with detection of errors of bit-parallel and bit-serial polynomial basis multipliers over finite fields of characteristic two. Our approach is to partition the multiplier structure into a number of smaller computational units and use the parity prediction technique to detect errors.
Introduction
Among the basic arithmetic operations over finite fields GF (2 m ), multiplication is the one which has received most attention in the literature [7, 4, 11, 9] . This is mainly because the implementation of a multiplier is much more complex compared to a finite field adder and using multiplication operation repeatedly one can perform other difficult field operations, such as inversion and exponentiation, which are extensively used in cryptographic systems [1, 10] .
Finite field multiplication is quite different from its counterparts in integer and floating point number systems. For todays cryptographic applications, the field size can be very large and each input of the multiplier can be 160 to 2048 bits long. Such a multiplier may require millions of logic gates and it is a challenging task to implement it free of faults. If one can have a multiplier which is capable of detecting error on-line at the presence of certain faults, cryptographic schemes can be operated more reliably. The importance of eliminating errors in cryptographic computations has been pointed out in some recent articles, for examples [2, 5] . The presence of faults in cryptosystems can lead to an active attack and the simplest way to prevent such an attack is to ensure that the computational device verifies the values it computes before sending them out.
In an attempt to detect errors in finite field multipliers, the authors of [3] have considered bit-serial multipliers in GF (2 m ) and have presented error detection schemes for four types of multipliers using a parity prediction technique. Their polynomial basis scheme for error detection is applicable to a special class of fields. These fields are defined using irreducible all-one polynomials that are available for certain values of m only. Additionally, when an all-one polynomial is irreducible, the corresponding m is not a prime. This makes many designers to avoid such a value of m and the corresponding irreducible all-one polynomial that define the underlying field for certain cryptosystems, such as those based on elliptic curve cryptography.
In this paper, we consider GF (2 m ) multipliers of both bit-parallel and bitserial types. The polynomial basis is used for representing the field elements. We investigate error detection techniques for such multipliers and develop parity prediction based error detection schemes for both bit-serial and bit-parallel multipliers. The new schemes can be used for any field defining irreducible binary polynomial.
Preliminaries

Multiplication Using Polynomial Basis
Let
be a monic irreducible polynomial over GF (2) of degree m, where 
where a i 's are the coordinates of A w.r.t. polynomial basis (PB). For convenience, these coordinates will be denoted in vector notation as
where T denotes the transposition of a vector. Let C be the product of any two elements A and B of GF (2 m ). Then, C can be represented w.r.t. PB as follows:
where
and
A bit-parallel architecture for GF (2 m ) multiplication using (5) is shown in Figure 1 . It mainly consists of three types of modules, namely, sum, pass-thru and α modules. The sum module (denoted as a double circle with a plus inside) is to simply add two GF (2 m ) elements and it can be realized in hardware using m two-input XOR gates. The pass-thru module (denoted as a double circle with a dot inside) is to multiply a GF (2 m ) element by a GF (2) element, i.e., if
are two inputs to a pass-thru module, then its output is
In hardware, each pass-thru module consists of m two-input AND gates. In Figure 1 , the third module (i.e., the rectangular shape α module) multiplies its input, which is an element of GF (2 m ), by α and reduces the result modulo F (α). Thus, this module is to essentially realize equation (6) in hardware.
Since α is a root of F (z),
Then multiplication of an arbitrary element A ∈ GF (2 m ) by α gives
Using (7) and (8), one can write
where x i 's are in GF (2) and are the coordinates of X w.r.t. the PB. For any irreducible polynomial over GF (2) , f 0 = 1. Thus from (9), we write the coordinates of X as
If ω is the Hamming weight of the irreducible polynomial F (z), then the realization of (10) requires ω − 2 XOR gates, and so does an α module. Thus, unlike the sum and pass-thru modules, the α module has a space (or circuit) complexity which depends on F (z). The space complexity is minimum when F (z) is a trinomial and maximum when F (z) is an all-one-polynomial (AOP). In vector notations, the coordinates of the GF (2 m ) multiplication can be calculated by the well-known formulation [7] as
Note that in Figure 1 , the α array generates m i,j 's and another part of the multiplier which consists of all pass-thru and sum modules realizes matrix-vector multiplication of (11).
Error Detection Strategy
In the following sections, we investigate error detection schemes for GF (2 m ) multiplication operation that relies on the architecture shown in Figure 1 . Towards this effort, the parity prediction method is used. This method is shown in Figure 2 We assume that the parity of A and B (i.e., p A and p B , respectively) are available or they can be reliably pre-computed while loading the coordinates of A and B into the multiplier. We also assume that the PP and PG blocks can be made fault free or any fault in them can be detected using a suitable mechanism since these blocks are simple and/or regular (for example PP can be as simple as an XOR gate and PG is a modulo 2 adder). In the following sections, we derive the function Γ CUT for each of the modules of the multiplier of Figure 1 . For the purpose of this investigation, we consider GF (2 m ) multiplier circuit with a single fault. The single fault case provides simplicity in our analysis. Although various types of multiple faults in the multiplier can be detected, we first consider the single fault case and then we show how multiple faults can be detected. This fault is modeled as a stuck-at fault, which appears to be the most common model used for logical faults. For this model, a fault in a logical gate (i.e., XOR, AND, OR, etc.) results in one of its inputs or the output being fixed to either a logic 0 (stuck-at-0, or s-a-0 in short) or a logic 1 (stuck-at-1, or s-a-1), respectively [6] .
Parity Predictions of Individual Module
In the following, we obtain the parity prediction functions of the modules of the bit-parallel multiplier of Figure 1 . PB multipliers (both bit-parallel and bitserial) that are capable of detecting errors are considered in the next section.
Parity Prediction in α Module
Let ω be the Hamming weight of the irreducible polynomial F (z). Then, (1) can be written as
where ρ j 's are powers of z in (1) with
and (10) can be written as
Using (13), a circuit diagram for the α module is shown in Figure 3 . Note that a stuck-at fault in one of the (ω − 2) XOR gates of this module causes at most one error at the output.
For j ∈ {ρ 1 , ρ 2 , · · · , ρ ω−2 }, assume that the j-th gate in Figure 3 is faulty. Then all the output coordinates, except x j , are error free. If the upper input of the j-th gate is stuck, then the erroneous j-th coordinate iṡ
where x indicates complement of x. On the other hand, if the lower input is stuck, thenẋ
Detection of such errors are discussed below. Assume that A and X are the input and output of the α module, respectively. Then we have the following lemma. 
where a m−1 is the (m − 1)-th coordinate of input A. Proof. Using (10),p X can be written aŝ
Since Using (16), we can obtain the relation between X and A in the fault free α module as In the α module, a stuck-at fault in one of its gates will result in an output which is different from X and (16) will not hold. Thus, equation (16) can be used for detecting an error in the output of the α module. Circuit for detecting such errors is shown in Figure 4 , whereê α = 0 indicates that no error has been detected andê α = 1 flags the detection of an error. Since (16) is over GF (2) , the values ofê α would detect not only a single error, but also any odd number of errors. With a similar argument, it is clear that even number of errors are not detected byê α .
Parity Predictions of Sum and Pass-Thru Modules
The sum module of Figure 1 
using one extra XOR gate. Let us denote this m + 1 XOR gates as the new sum module.
The pass-thru module of Figure 1 multiplies an element A ∈ GF (2 m ) by a single bit b ∈ GF (2) which can be implemented using m two-input AND gates. Let G ∈ GF (2 m ) be the output of such a module with inputs of A and b. Thus, the output of this module G is zero when b = 0 and A when b = 1.
Detection of an odd number of errors is accomplished by using a single parity bit similar to the α and sum modules. Let A and p A = m−1 i=0 a i be the input of the pass-thru module and its parity bit respectively. Then, the parity bit of the output G = bA is found as
are the coordinates of G. Thus, the predicted parity bit of the output can be expressed asp
which requires only one AND gate for its implementation. Let us denote the original pass-thru module together with this AND gate as the new pass-thru module similar to the new sum module. These new modules are used in the next section.
Error Detections in Polynomial Basis Multipliers
The discussions of the previous section deals with the parity prediction functions of individual modules of the multiplier of Figure 1 . Using these parity functions, below we attempt to detect errors in the entire multiplier.
Bit-Parallel PB Multiplier
Let us generalize (1) 
t. the polynomial basis, respectively. Thenp
Thus, the parity bit of the output of the polynomial basis multiplier can be predicted using the following theorem. 
Theorem 1. Let C be the product of two arbitrary elements
A proof of the above theorem is not included here for lack of space. Note that the theorem is not restricted to any particular irreducible polynomials. When F (z) is an all-one polynomial, the expression forp C , which can be obtained from Theorem 1, matches the corresponding result reported in [3] .
To detect only one error (in general any odd number of errors) at the output of the multiplier, equation (21) can be realized easily by replacing all the α, passthru and sum modules in Figure 1 with the α module and the new pass-thru and sum modules as shown in Figure 5 (these three new modules are shaded in this figure to distinguish them from the old ones). The bus width of this multiplier is m + 1. Since the output of any gate of the shaded pass-thru and sum modules in Figure 5 is connected to only one gate, the single stuck fault at any gate of these modules changes only one coordinate of the output of this multiplier. Therefore, a circuit that compares the actual parity p C with the predictedp C , which is shown at the end of the figure, is capable of detecting any single fault in the shaded sum and pass-thru modules of Figure 5 . Also, it is clear that any single fault in any XOR gate in the parity generation circuit p C and the very last XOR gate can be detected byê. This circuit, however cannot detect a single stuck-at fault in any of the α modules with the exception of the rightmost α module, because such a fault is most likely to change more than one bit of the multiplier output. Then, these errors cannot be detected if an even number of output bits are changed due to a single fault in the α array. To overcome this problem, the following method is proposed. For detecting a single fault in the entire multiplier one can change the α array (all α modules excluding the XOR gates for parity prediction ofp X (j) 's) in such a way so that all 1) ), i.e., X (i) = α i A and this is shown in Figure 6 . This makes the output of any gate inside the new α array connected to only one gate. In Figure 6 , the output of the α i modules, X (i) , 1 ≤ i ≤ m − 1, are found directly from A. Also, it is noted that the coordinates of X (i) 's are obtained using the following matrix equation
where x (i) is a vector whose entries are coordinates of X (i) defined by (6) and
is the α-multiplication matrix. Using (22), the α i module in Figure 6 is realized with XOR gates according to the G i matrix. As a result, a single stuck-at fault at any logic gate in the multiplier, except in the XOR gates for parity prediction ofp X (j) 's, can affect at most one bit of the output so that it is detected using the parity prediction of (21).
Note that x (k) m−1 used in (20) is a function of A and can be calculated and then should be realized separately by using Proposition 4.1 of [7] as follows . . .
By substituting (23) into (21), one can realizep X (j) as a function of the coordinates of A using XOR gates. Thus any single stuck-at fault in the entire new multiplier results in at most one error and can be detected.
Bit-Serial PB Multiplier
The PB multiplier of Figure 1 can be realized in a bit-serial fashion as shown in the contents of X and Y registers, respectively, at nth, 1 ≤ n ≤ m, clock cycle. Suppose the X register is initialized by A, i.e., X(0) = A, then the content of this register at the nth clock cycle is
is defined in (6) . Also, suppose that the register Y is cleared at the initial step, i. In order to detect errors in the bit-serial multiplier of Figure 7 , we check the contents of two registers in every clock cycle. Consider Figure 7 before the triggering of the nth clock cycle when the input and output of the X register are X(n) and X(n − 1), respectively and using Lemma 1, we have
where x i (n − 1) ∈ GF (2) is the ith coordinate of X(n − 1). In order to comparê p X(n) with the actual value of p X(n) , we storep X(n) into a 1 bit register D X as shown in Figure 8 . Then, after the nth clock cycle, X(n) appears at the output of the X register and the actual value of p X(n) is evaluated and compared with the value of D X , i.e.,p X(n) using the last XOR gate of Figure 8 . Similar expression can be obtained for the Y register. Since
and can be implemented and compared with the actual value of p Y (n) as shown in Figure 8 . As a result, after the first clock cycle, bothê CX andê CY should be 0 during the next m clock cycles if there are no single errors. 
Conclusions and Future Work
In this article, we have considered detection of errors in polynomial basis multipliers. We have used a multiplier structure where a single stuck-at fault causes only odd number of errors at the output. Towards the detection of this type of errors, necessary theoretical results have been presented. Compared to the previously published results [3] , the work presented here is quite generic in the sense that it can be applied to any irreducible polynomial defining the field. The parity prediction method of [3] is only for bit-serial multipliers and based on the prediction of the output parity after the final clock cycle and then comparing it with the actual parity. Although, it reduces the cost of overhead, but its probability of error detection is only about 50% or less. This is because a single fault in their bit-serial multiplier produces multiple errors after m clock cycles and the number of effective errors resulting from the single fault is either odd or even and only the odd number of errors can be detected. The proposed circuit in Figure 8 overcomes this problem. It compares the predicted parity of the storage registers with the actual ones at every clock cycle. Although it costs extra hardware, the probability of error detection of our bit-serial multiplier is about 100%. This result has been verified using a simulation program for a prototype multiplier with F (z) = z 4 + z + 1. Using VHDL, we have injected single faults at different nodes of the bit-serial multiplier for all elements of A and B. The probability reaches unity as m increases.
The proposed error detection schemes are not limited to the multiplier architectures discussed in this article. They can be easily extended and applied to other GF (2 m ) multipliers. For example, we have considered the bit-serial multiplier introduced by Peterson [7] and have made it capable of detecting single faults. Furthermore, although our discussions have centered around bit-parallel and bit-serial multipliers over GF (2 m ), by combining the error detection schemes for serial and parallel multipliers, one can develop an error detection scheme for hybrid multipliers over composite fields [8] .
More research is needed to reduce the overhead cost of the proposed multiplier. For example, hardware implementation of the architecture shown in Figure  6 appears to be expensive. Currently we are trying to develop an architecture that can alleviate this problem.
