Abstract-Efficient hardware implementations of arithmetic operations in the Galois field are highly desirable for several applications, such as coding theory, computer algebra and cryptography. Among these operations, multiplication is of special interest because it is considered the most important building block. Therefore, high-speed algorithms and hardware architectures for computing multiplication are highly required. In this paper, bit-parallel polynomial basis multipliers over the binary field generated using type II irreducible pentanomials are considered. The multiplier here presented has the lowest time complexity known to date for similar multipliers based on this type of irreducible pentanomials.
I. INTRODUCTION

B
INARY GALOIS field arithmetic is a widely studied subject due to its use in several important applications.
arithmetic only requires AND and XOR gates for its implementation. XOR-based logic functions have been studied since the 1960s [1] due to their use in coding theory [2] , digital signal processing, cryptography and telecommunication circuits. These applications frequently require efficient very large scale integration (VLSI) implementations of high speed multipliers [3] - [9] . For this reason, several bit-parallel polynomial basis (PB) multipliers have been proposed. Polynomial basis is the most widely used, although normal [10] or dual [11] basis can also be considered. The complexity of the multiplier depends on the generating irreducible polynomial selected for the finite field. For hardware implementation of multiplication, low Hamming weight irreducible polynomials, such as trinomials and pentanomials, are usually used. For irreducible trinomials, multipliers with low area and time complexities can be implemented [12] - [14] . Unfortunately, there are 468 values of in the interval [2, 1024] such that irreducible trinomials of degree do not exist. For each of the other values of in the same range, where no such irreducible trinomial exist, an irreducible pentanomial can be found. Thus, the design of multipliers using irreducible pentanomials is needed. Polynomial basis multiplication requires a polynomial multiplication followed by a modular reduction. An efficient bit-parallel multiplier was proposed by Mastrovito [15] Manuscript received August 05, 2015; revised October 06, 2015; accepted October 31, 2015. Date of publication December 18, 2015 ; date of current version February 15, 2016 . This work was supported by the Spanish Government under Research Grants CICYT TIN2008-00508 and TIN2012-32180. This paper was recommended by Associate Editor S. Ghosh.
The author is with the Department of Computer Architecture and Systems Engineering, Faculty of Physics, Complutense University, 28040 Madrid, Spain (e-mail: jluimana@ucm.es).
Digital Object Identifier 10.1109/TCSI. 2015.2500419 in which a product matrix is introduced to combine the above two steps together. The entries in this matrix can be computed efficiently by sharing common items, known as subexpression sharing [16] . Mastrovito multipliers using special irreducible pentanomials have been widely studied due to their low-complexity implementations [12] , [13] , [17] , [18] . All these works exploit subexpression sharing in order to find efficient architectures. Other methods use the divide-and-conquer approach for polynomial multiplication in order to reduce the complexity of the multiplier [19] , [21] . In [9] , a new PB multiplication method was used. This method is based on the introduction of a product matrix that can be decomposed as a sum of matrices depending on the selected irreducible polynomial. Matrix decomposition was already used in similar multiplication approaches that exploit subexpression sharing [12] , [13] , [17] , [18] . The method in [9] introduced the functions and given by the raw sum of terms and , where are the coefficients of two elements and , respectively. The coefficients of the product of two field elements can be computed as the sum of that functions. One of the problems of the above method is related with the monolithic construction of the and functions. For example, for the functions and are defined. The sum of these two functions would result in a 3-level (with depth 3) binary tree of XOR gates. However, the sum of involves the addition of four product terms ( , , and ) and it could be done with a 2-level complete binary tree of XOR gates.
In this work, a new bit-parallel PB multiplier is presented by considering the functions and as a sum of and terms, respectively, in such a way that and for a given finite field , where and . The terms and represent the addition of products and therefore can be implemented as a -level complete binary tree of XOR gates. In this way, the addition of terms and with the same superscript would result in a -level complete binary tree. If the sum of the functions and is performed by grouping the additions of terms with the same -level and , then the number of XOR levels needed to compute the product coefficients can be reduced. Furthermore, the coefficients and are given by the binary representations of the subindex for and of the value for , respectively. In this contribution, the new multiplication approach is applied to type II irreducible pentanomials [18] , with . These pentanomials are important because they are abundant (there are 597 values of in the interval [5, 1000] such that these type of irreducible pentanomials of degree exists) and because all five binary fields recommended by NIST for ECDSA, i.e., , can be constructed using such irreducible polynomials.
The paper is organized as follows. Notation and mathematical background are presented in Section II, where PB multiplication for type II irreducible pentanomials given in [9] is also reviewed. The new multiplication approach is presented in Section III, where an example of multiplication and the complexity analysis are also given. In Section IV comparisons with other similar multipliers are done. Finally, concluding remarks are made in Section V. . Let and be their coefficient vectors, respectively. Using the method given in [9] , the product can be computed as , where is the vector of reversed coefficients of and is the product or Mastrovito matrix that depends on and on the coefficients of . In order to compute the coefficients, a new notation was given in [9] . These coefficients consist of sum-of-products (SOP) given by the inner products of and . An inner product can be represented by the permutation given by the subscripts of the coefficients of and , respectively, in the SOP. From this permutation, 1-cycles and 2-cycles , can be found and associated with the terms and , respectively. For example, the SOP can be represented by the cycles (0,4)(1,3)(2). In [9] , the sum of the and terms represented by the 1-cycles and 2-cycles were carried out by the functions and . These functions are implemented as binary trees of 2-input XOR gates with a lower level of 2-input AND gates (corresponding to the products of the coefficients of and ). The product can be computed as the sum of these functions. The expression for with , is [9] :
II. NOTATION AND PRELIMINARIES
where only appears for odd. The expression for with is as follows:
where the term only appears for ( and even) or for ( and odd). was studied in [9] . The coefficients of the PB product were given as the sum of and terms, as shown in Table I, where . In this table, the coefficients have been divided into seven sections (named from to ), depending on the number of and terms in the sums. The first section (from to ) has 5 terms; section with , and has 4, 7 and 6 terms, respectively; section ( to ) has 8 terms; sections ( , ) and ( , ) have 7 and 6 terms, respectively; section ( to ) has 5 terms; and section has 4 terms. From (2), it can be observed that the term is given by the addition of terms and the term (if it exists). Therefore, performs the sum of the maximum number of terms among ones and it presents the highest delay. As this term appears in the coefficient that is included in section with the maximum number of terms, then is the coefficient with the highest delay of the multiplier. In the following section, a new scheme for multiplication is given.
III. NEW MULTIPLIER FOR TYPE II IRREDUCIBLE PENTANOMIALS
The functions and presented in (1), (2) are given by a raw sum of terms and . The coefficients of the product of two field elements represented in PB can be computed as the sum of that functions, as given in Table I . One of the problems of the above method is related with the monolithic construction of the and functions. For example, for the functions and are defined. The sum of these two functions , where the terms in brackets point out that they must be added (XOR) previously to the XOR with the other terms, would result in a 3-level (with depth 3) binary tree of XOR gates. However, the sum of involves the addition of four product terms ( , , and ) so it could be done with a 2-level complete binary tree of XOR gates if the involved additions could be performed in a separate way, i.e., if the product could be first added with the term and then perform the addition with in the form . In this paper, a new bit-parallel PB multiplier is presented by considering the functions and as a sum of and terms, respectively, in such a way that and for a given finite field , where and . The initial terms and represent the addition of products and therefore can be implemented as a -level complete binary tree of XOR gates. In this way, the addition of two terms and with the same superscript would result in a new XOR in the level (i.e., a new -level term) that represents a -level complete binary tree. If the sum of the functions and is performed by grouping the additions of terms with the same -level and , starting with the lower levels, then the number of XOR levels needed to compute the product coefficients can be reduced. In this way, the 0-level initial terms and should be first added in pairs to give rise to a new XOR in the level 1 (i.e., a new 1-level binary tree term), that in turn should be added with other 1-level term to give rise to a new 2-level complete binary tree and so on. If there is only one -level term (or there is an unpaired -level term), then it should be added with an immediately above -level term in order to have a new -level tree. If no such a -level term exists, then it should be added with a -level term, and so on.
From (1), (2), the computations of the initial terms and of and are given in Algorithm 1 and Algorithm 2, respectively, where the term has been used. In these algorithms, the condition in the inner for loop determines if the or terms have an initial term or at level . This condition will be further explained in Section III-B.
Algorithm 2 Computation of initial
terms of . A characteristic of the previous representation is that the coefficients and are given by the binary representations of the subindex for and of the value for , respectively. This can be deduced from the expressions of and defined in (1) and (2). For example, from (1) it can be observed that is given by the sum of product terms . As any number can be given as a sum of powers of 2, then can also be given as a sum of powers of 2 of product terms . The addition of products was previously denoted as . Therefore, in the notation , the coefficients correspond with the binary representation of . A similar reasoning can be done for considering that is given by the sum of product terms . Furthermore, in order to reduce the number of XORs needed for the computation of the product, common terms appearing in several coefficients can also be shared. These common terms correspond to the addition of consecutive and terms, i.e., and , that lead to the addition of terms and , respectively, for different levels determined by the binary representations of the subindex (for ) and (for ). The addition of any pair of terms in level creates a new term in level . From Table I , it can be noted that for the coefficients of the multiplier only common additions can be found. Using the binary representations of , it can be observed that for even , the common sums are for , while that for odd , the common terms are for . The occurrence of these common groups in the coefficients of the product is studied in Appendix B.
The algorithm for the computation of the new proposed multiplication is given in Algorithm 3. In the first for loop, the common terms to be shared are created, where refers to that for even , the subindex ranges from 0 to , while that for odd , ranges from 1 to . For each coefficient in Table I , the outer for loop processes (for each level ) the initial and terms, creating new -level terms and sharing common terms (if any). The while loop processes terms from level to a level with only two terms, in such a way that the maximum level will be for the given coefficient. The execution of the algorithm will provide the coefficients of the product. The above new method of multiplication is clarified with the following example. 
Algorithm 3
A. Multiplication Example over
Let us consider the product of two elements and in generated by the type II irreducible pentanomial . The and functions can be computed using (1), (2) and are given in Table II has not the term because it is associated with the binary coefficient 0. The above representation is given in the third column in Table II , where the binary coefficients and associated with the terms and , respectively, are given. It can be noted that the binary coefficients ordered in the form for correspond with the binary representation of the subindex while that the binary coefficients for correspond with the binary representation of . The fourth column in Table II includes these binary representations. For example, the term corresponds with the binary vector (1101) while that corresponds with (1011) that is the binary representation of the value (in this example with ). It can be observed that the and terms given in Table II can also be computed using Algorithm 1 and Algorithm 2.
In order to reduce the space complexity of the multiplier, common terms appearing in several coefficients can also be shared. It can be observed in Table II 
, and . The sum of the and terms that appear in the coefficients of the product must be done using the above groups in order to optimize the implementation.
The coefficients of the product are given in Table III using  Table I for this  irreducible pentanomial. The previous TABLE II   AND   FUNCTIONS FOR and groups found in the product coefficients are shadowed in Table III . It can be observed that only one term appears in each coefficient, so only the groups are used. The group appears in three coefficients ( , and ) and the groups , , , and appear in two coefficients. This means that only one of each of the above groups must be implemented and therefore the other occurrences of the groups must not be implemented. The number of XOR gates that can be reduced is given by the number of terms in each group. Using Table II it can be observed that the group involves the sum of two terms and and therefore it requires 2 XOR gates. In the same way, the groups , , , and require 2, 1, 2, 1, and 1 XOR gates, respectively. Furthermore, as appears in three coefficients, then the number of XOR gates that can be reduced will be 2 times the number of XOR gates required, i.e.,
. Therefore, the number of XOR gates that will be reduced for the above groups will be, respectively, . In Section III-B, general expressions will be given in order to compute the number of XOR gates that can be reduced due to the groups.
Using the and terms given in Table II for and , respectively, the coefficients of the product are shown in Table IV . In this table, the sum of terms is accomplished using the rule previously given, i.e., the sum of the functions and is performed by grouping the sums of terms with the same -level and , starting with the lower levels. In this way, the 0-level initial terms and should be first added in pairs to give rise to new 1-level binary trees (1-level terms), that in turn should be added in pairs with other 1-level terms to give rise to new 2-level complete binary trees and so on. If there is only one -level term (or there is an unpaired -level term), then it should be added with an immediately above -level term in order to have a Table IV is represented by means of parenthesis. To reduce the number of XORs needed for the computation of the product, common terms appearing in several coefficients can also be shared. This is represented with shadowed boxes that correspond with the previously stated groups.
In order to illustrate the method, the implementation of the coefficient is given in Fig. 1 . This coefficient requires the addition of 8 terms, so it is the most complex coefficient and determine the maximum delay of the multiplier (in fact, the coefficient of a multiplier given by Type II pentanomials is the most complex one, so it is used to determine the maximum delay complexity). In this figure, the and terms are represented by filled circles. These circles correspond to the initial and terms given in Table II in such a way that TABLE IV  COEFFICIENTS OF THE PRODUCT FOR  WITH   AND   TERMS   ,  ,  ,  ,  ,  ,  and . Vertical dashed lines represent the level of XOR binary trees. For example, the term in line 2 represent the 2-level binary tree . Circles enclosed within ellipses represent the terms of the corresponding and functions. For example, the function is given by the three initial terms , and . Furthermore, the gray color XOR trees represent the group that can be shared in several coefficients (in this case, the group correspond with the additions and ). It can be observed in Fig. 1 that the addition of terms follows the rule previously given, starting with 0-level terms and ascending in the construction of binary XOR trees. For example, the addition of the initial 0-terms and gives rise to a new 1-level term (a new XOR in level 1) that in turn is added with the initial 1-term to give rise to a new 2-level XOR term and so on. In this example, can be constructed with a 6-level binary XOR tree so the delay complexity of the multiplier is given by , where and represent the delay of 2-input AND and XOR gates, respectively. The delay corresponds to the 0-level products of the coefficients of and . It must be noted that the best delay complexity for this multiplier given by other similar methods in the literature is . The space complexity (number of AND and XOR gates) can also be computed. The number of AND gates is given by all the different products , with . This number can be computed using (1), (2) and for this example is given as 196 AND gates (see also Table II) . It is proved in Appendix B that the number of AND gates for a multiplier is . The XOR gates can be computed as the sum of XOR gates in the initial and terms (as given in Table II ) plus the number of new XOR gates generated in the coefficients (as given in Table IV ) minus the number of XOR gates due to the groups shared among coefficients. The and terms perform the XOR of product terms, therefore the number of XORs is . In this example, there are 7 , , and terms each. Therefore the number of XOR gates in the initial terms will be . There are also 7 terms and 6 , and terms each, so the number of XOR gates in the initial terms is . The number of new XOR gates generated in the coefficients for the sum of and terms can be found in Table IV and in this case is 134 XOR. Finally the number of XORs due to the groups shared among coefficients were previously computed and it was found to be 10 XOR. Therefore the total number of XOR of this multiplier is .
B. Complexity Analysis
General expressions for time and space complexities for the multiplier are given in this subsection.
1) Time Complexity:
The coefficients in Table I have been divided into seven sections, depending on the number of and terms in the sums. The first section (from to ) has 5 terms; section with , and has 4, 7 and 6 terms, respectively; section ( to ) has 8 terms; sections and have 7 and 6 terms, respectively; section ( to ) has 5 terms; and finally section has 4 terms. It can be observed in Table II that the term has the highest complexity among terms. The coefficient , that is included in section with the maximum number of terms (8) , includes this complex term . Therefore, is the most complex coefficient of a multiplier given by Type II irreducible pentanomials and it will be used to determine the highest delay of the multiplier.
In order to do that, the complexity of the and terms must be determined. Their complexity depends on the number of the initial and terms they have. These terms can be represented as and for a given finite field , where and . Therefore, the coefficients , determine if the corresponding , appear in and , respectively. As previously proved, the coefficients and are given by the binary representations of the subindex for and by the value for , respectively. Therefore the study of the number of terms in can be reduced to the study of terms in using the equivalence (only in relation to the number of terms) . For the most complex coefficient , this equivalence results in that , , , , , and , where , so will be equivalent to . The binary representation of , , , , , , and must then be determined. The binary configuration of a number can be given by the expression (3) The value determines if the binary representation of has a 1 in the position with weight in such a way that if has an even value, then has a 0 while that if has an odd value, then has a 1. A 1 in the position with weight for the binary representation of will represent that the terms , have a term , that is the sum of product terms and that is implemented by means of a binary XOR tree with depth (an -level binary tree).
In order to compute the depth of the binary tree of XOR gates in given by the coefficient , the number of total terms in the -level must first be determined. The initial levels for a given are . For a given level , the number of new XOR terms that will result in level due to the addition in pairs of the -level terms is given by -. For example, in Fig. 1 there are seven 0-level terms whose sum gives rise to the four 1-level XOR terms besides the term , that can also be considered as a 1-level term (in order to be added to and result in a new 2-level XOR term).
Let be the number of initial terms and in level . This number will be given by the terms in the previously computed equivalent expression (in relation to the number of terms)
. In order to do that, the binary representation of , , , , , , , and must then be determined. Using (3), the value 2 determines if the term has an initial term in the position with weight , i.e., in level . Representing as , then the number of initial terms and in level (i.e., ) can be computed as follows: (4) It must be noted that in (4) the fourth addend corresponds with the real term, while that the rest of addends correspond with the equivalence previously given . Using (4), the number of initial terms and for the coefficient given in the example in Section III-A can be computed. The number of initial terms in level 2, for example, will be corresponding with the initial terms , , , , , respectively, that are represented in Fig. 1 as filled (black and gray) circles.
If denote the number of initial and terms in levels , respectively, then the total number of terms in the -level (denoted by ) will be the addition of the initial terms in that level plus the terms created due to the addition of terms in lower levels. In Section A of Appendix A, it is proved that these terms created in level due to the addition of terms in levels is given by the expression:
Therefore, the total number of terms in the -level will be the sum of plus the expression in (5). In Appendix A it is proved that this addition is:
The sum in pairs of the terms determined in (6) will determine the final level reached to compute the coefficient. Therefore the number of XOR levels needed to compute this coefficient will be . Finally, the highest delay of the multiplier based on type II pentanomials given by the coefficient is:
In order to compare this time complexity with other multipliers found in the literature, in Section B of Appendix A the following upper bound is derived for the XOR delay of the multiplier: (8) 2) Area Complexity: In order to determine the area complexity of the PB multiplier given in Table I , the number of AND and XOR gates of the and terms must be known. In this work, these terms have been considered as a sum of and terms, in such a way that and for a given finite field , where and . In Table I , the coefficients of the product are given as sums of and terms where their corresponding components and are considered as individual terms when performing the sum. It can be observed that the terms, , appears only once while that the terms, , appear several times. One way to determine the number of AND and XOR gates of the and terms is to count the number of AND and XOR gates given by the sum of terms and in (1), (2) . In this way, we compute the total number of AND gates of the multiplier, the XORs of the and terms, the XORs needed for the sum of all the terms of and the XORs needed for one sum of the terms of , i.e., we count the XORs due to the contribution of all the terms and of one occurrence of the terms. If a term appears times in the additions given for the coefficients in Table I , then the other occurrences are taken into account by computing the number of XORs needed for the sum of the terms of and multiplying it by . This must be done for each , . To determine the area complexity of the PB multiplier, the number of XOR gates needed for the sum of the and terms in the product coefficients of Table I and the number of shared groups that appear in the product coefficients should also be computed. This number of groups must be subtracted from the previous XOR gates computed. Therefore, the following figures must be computed to obtain the XORs of the multiplier:
The number of XOR gates given by and in (1), (2) .
The number of XOR gates needed for the sum of the and terms in the product coefficients. For each , the number of times that appears in Table I and the number of XORs needed for the sum of the terms for must be determined. Then the XOR gates given by must be computed. The number of XOR gates given by the shared groups that appear in the product coefficients. The XOR gates of the multiplier will be . In Appendix B the following values have been computed:
• The total number of AND gates of the multiplier is .
• The number of XOR gates given by and in (1), (2) is .
• The number of XOR gates needed for the sum of the and terms in the product coefficients is . • The number of XOR gates can be computed by , where the number of XOR gates needed for the sum of the terms of is given by:
where is the Hamming Weight of and where is given as: (10) • The number of XOR gates given by the shared groups that appear in the product coefficients is: (11) where represents the limit of the summatory for even represents the limit for odd represents the Hamming Weight of to be computed for even and the Hamming Weight of for odd . Therefore, the XOR gates of the multiplier given by the addition will be (12) A more compact expression for (12) could not be found. The functions and could be computed for any value of using Maple. In Table VII the values of these functions for are given. Using Table VII , it can be observed that for the example given in Section III-A with , , the values and . In this example, the values and can also be computed. Applying the above values to (12) we have gates, matching the result given in Section III-A.
IV. COMPARISON WITH OTHER PB MULTIPLIERS
In Table V the theoretical complexities obtained by the approach here proposed are compared with the best results known to date for bit-parallel polynomial basis multipliers over generated by type II irreducible pentanomials. In (8) it was proved that . It can also be observed that , where is the best XOR delay found in the literature for this type of bit-parallel multipliers [9] . Simulations have been done using Maple that have proved that the delay of our multiplier is less than or equal to the delay in [9] , i.e.,
. From the simulation results, it was found that for the 593 different values of in the interval for which an irreducible type II pentanomial exists, the proposed multiplier has the smallest delay in 465 different values of the field size . More specifically, among the type II irreducible pentanomials existent in , there are 477 and 1162 different combinations of for which the proposed multiplier has equal and less delay, respectively, than the multiplier in [9] . With respect to area complexity, it was found that the proposed multiplier presents equal number of AND gates in comparison with the other similar multipliers existing in the literature (except for the approach presented in [21] ) and a higher number of XOR gates in comparison with the other multipliers. This increased number of XOR gates is due to the separation of the monolithic functions into the corresponding terms in order to achieve a reduced delay for multiplication.
In Table VI the complexities of bit-parallel polynomial basis multipliers using type II irreducible pentanomials for the five finite fields with recommended by NIST for ECDSA are presented. From the Table VI,   TABLE VII  COMPUTED VALUES FOR  and  , it can be observed that the multiplier here proposed presents the lowest delay except for , that matches the best delay given in [9] .
V. CONCLUSIONS High-speed algorithms and hardware architectures for computing multiplication are highly required in several applications, such as coding theory, computer algebra and cryptography. In this paper, a new bit-parallel polynomial basis multiplier for type II irreducible pentanomials with reduced time-complexity has been presented. The coefficients of the multiplier are computed as a sum of and functions given by the addition of product terms of the coefficients of the two operands to be multiplied. In the new approach here proposed, the sum of products in the and functions are separated into sums of product terms (corresponding to the initial and terms) that can be implemented as binary trees of XOR gates with depth . The sum in pairs of binary trees with the same depth, starting with the lower levels, leads to a reduction of the time complexity of the multiplier. In this paper, a complete multiplication example has been presented. The theoretical complexity analysis has shown that the proposed bit-parallel multiplier presents the lowest delay among the best results known to date for similar polynomial basis multipliers based on irreducible pentanomials. Simulations have been done that have proved that for the 593 different values of in the interval for which an irreducible type II pentanomial exists, the proposed multiplier has the smallest delay in 465 different values of the field size . Furthermore, for the five binary fields recommended by NIST for ECDSA, i.e., , the multiplier here proposed presents the lowest delay except for , that matches the best delay given in the literature.
APPENDIX A TIME COMPLEXITY
A. Total Number of Terms in -level
Let denote the number of initial and terms in levels , respectively. As previously stated, for a given level , the number of new XOR terms that will result in level due to the addition in pairs of the -level terms is given by . Starting in level 0, then the new terms created in level 1 due to the sum in pairs of the initial terms in level 0, , will be . The total number of terms in level 1, denoted by , will now be . Using the property of modulo operation for integer, then we have that . Next the new terms created in level 2 due to the sum in pairs of the terms in level 1, , will be . Using the property of modulo operation for positive integers and arbitrary real number , then the new XOR terms created in level 2 will be . The total number of terms in level 2, denoted by , will now be . Proceeding in the same way we will have that the new XOR terms created in level due to the sum in pairs of the terms in level , , will be , that is (5) . Finally, the total number of terms in the -level will be the sum of plus the expression in (5) , that is: (13) Now (4) can be used to simplify (13 Using (14)- (16), the numerator of (13) can be simplified as follows:
The term in (17) can be computed using the definition previously given. According to that definition, it can be observed that is the sum of eight floor functions of eight quotients with the same denominator and where the numerators are integers smaller than . However, the value of is always greater than , so the quotients are less than unity and all the floor functions are always zero. Therefore, the term and (17) will be . Using this result and applying it to (13) , it follows (18) , that matches (6).
(18)
B. Upper Bound for Delay
The delay of the multiplier given in (7) is the following:
For type II irreducible pentanomials, , so for even we have that while that for odd we have . The following operations can be done: • Even . We have . Substituting this expression in the quotient in and using the fact that , then we have and therefore . Finally we will have that (20) • Odd . We have and then we get the same results as in the previous case for even . Using the result given in (20), then we have and using the property we have finally that the XOR delay of the multiplier can be upper bounded as follows, matching (8):
APPENDIX B AREA COMPLEXITY The XOR gates of the multiplier will be . These quantities are determined as follows:
The functions and as given in (1), (2) are implemented as binary trees of 2-input XOR gates with a lower level of 2-input AND gates (corresponding to the products). The number of AND and XOR gates for are and , while that for they are and , respectively [9] . The total contribution of and to the space complexity is AND and XOR gates [9] . Therefore, the total number of AND gates of the multiplier is and the number of XOR gates given by and in (1), (2) is . The coefficients in Table I have been divided into seven sections (from to ). In Section II, the number of and terms in the sums for each section was given. Taking into account these numbers, then the XOR gates in the product coefficients are as follows [9] :
for section ; in ; in ; 12 and 10 in and , respectively; in ; and finally 3 in . Then the number of XOR gates needed for the sum of the and terms in the product coefficients is . It must be noted that this number corresponds with the general case in which all the sections from to appear in Table I . There are some special cases for which the above complexity can be reduced. However, the number of such cases is negligible and they are not considered.
In order to compute the number of XOR gates, we must first determine the number of times each appears in Table I . It can be found that for the general case in which all the sections from to exist, there are terms that appear 4 times, terms and ) that appear 7 times and the term appearing 6 times. As previously stated, one occurrence of the terms is already included in , so we must compute the XOR gates due to appearing 3 times, and appearing 6 times and appearing 5 times. If we define and , where is the number of XORs needed for the sum of the terms for , then we can write that the number of XOR gates is given by . Using the equivalence (only in relation to the number of terms) and denoting where is the number of XORs needed for the sum of the terms for , then we can compute the number of XOR gates using . The number of XOR gates for the sum of the terms for can be computed using the number of 1's in the binary configuration of . For example, in Table II is given in the form and therefore 2 XOR gates are needed to perform the additions of the terms. The binary configuration of the subindex 13 in this case is (1101), i.e., with three 1's. Therefore the number of XOR gates will be the number of 1's in the binary configuration of 13 minus 1. Using (3) and using the definition of operator , the Hamming Weight of , can be computed as . Therefore the number of XOR gates needed for the sum of the terms for will minus 1:
that matches (9) . Using (22), then can be computed:
that matches (10) . The number of XOR gates given by the shared groups that appear in the product coefficients can be computed in a similar way as done in Section III-A. It can be found that for even , the group appears in three coefficients ( , and ) while that for odd , the group appears in the same coefficients. On the other hand, for even , the groups appear in two coefficients while that for odd , the groups also appear in two coefficients, in both cases excluding the previous groups for even or odd. This means that only one of each of the above groups must be implemented and therefore the other occurrences of the groups must not be taken into account. It must be noted that from the above groups, the term with highest subindex gives the number of XOR gates to be shared. For example, using Table II it can be observed that for (with three terms) and (two terms), the group involves the sum of two terms and and therefore it requires 2 XOR gates. In order to compute the number of XORs, we must use the equivalence (only in relation to the number of terms)
. Then the previous group for even corresponds with , the group for odd corresponds with , the groups for even correspond with and the groups for odd correspond with . Furthermore, using the equivalence we have that the term with lowest subindex gives the number of XOR gates to be shared. Then the number of XOR gates represented by the above shared groups is given by the number of 1's (Hamming Weight) in the binary configuration of for even or of for odd plus the Hamming Weight of for even or of for odd . Therefore the number of XOR gates given by the shared groups that appear in the product coefficients is computed by (24) that matches (11) . In (24), represents the limit of the summatory for even represents the limit for odd , represents the Hamming Weight of to be computed for even and the Hamming Weight of for odd .
