High-speed polynomial basis multipliers over GF(2^m) for special pentanomials by Imaña Pascual, José Luis
IE
EE
Pr
oo
f
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 63, NO. 1, JANUARY 2016 1
High-Speed Polynomial Basis Multipliers Over
for Special Pentanomials
José L. Imaña
Abstract—Efﬁcient hardware implementations of arithmetic
operations in the Galois ﬁeld are highly desirable for
several applications, such as coding theory, computer algebra and
cryptography. Among these operations, multiplication is of special
interest because it is considered the most important building
block. Therefore, high-speed algorithms and hardware architec-
tures for computing multiplication are highly required. In this
paper, bit-parallel polynomial basis multipliers over the binary
ﬁeld generated using type II irreducible pentanomials
are considered. The multiplier here presented has the lowest time
complexity known to date for similar multipliers based on this
type of irreducible pentanomials.
Index Terms—Bit-parallel multipliers, ﬁnite ﬁeld, , ir-
reducible pentanomials, polynomial basis.
I. INTRODUCTION
B INARY GALOIS ﬁeld arithmetic is a widely studiedsubject due to its use in several important applications.
arithmetic only requires AND and XOR gates for
its implementation. XOR-based logic functions have been
studied since the 1960s [1] due to their use in coding theory [2],
digital signal processing, cryptography and telecommunication
circuits. These applications frequently require efﬁcient very
large scale integration (VLSI) implementations of high speed
multipliers [3]–[9]. For this reason, several bit-par-
allel polynomial basis (PB) multipliers have been proposed.
Polynomial basis is the most widely used, although normal [10]
or dual [11] basis can also be considered. The complexity of the
multiplier depends on the generating irreducible polynomial
selected for the ﬁnite ﬁeld. For hardware implementation
of multiplication, low Hamming weight irreducible
polynomials, such as trinomials and pentanomials, are usually
used. For irreducible trinomials, multipliers with low area and
time complexities can be implemented [12]–[14]. Unfortu-
nately, there are 468 values of in the interval [2,1024] such
that irreducible trinomials of degree do not exist. For each
of the other values of in the same range, where no such
irreducible trinomial exist, an irreducible pentanomial can be
found. Thus, the design of multipliers using irreducible pen-
tanomials is needed. Polynomial basis multiplication requires a
polynomial multiplication followed by a modular reduction. An
efﬁcient bit-parallel multiplier was proposed byMastrovito [15]
Manuscript received August 05, 2015; revised October 06, 2015; accepted
October 31, 2015. This work was supported by the Spanish Government under
Research Grants CICYT TIN2008-00508 and TIN2012-32180. This paper was
recommended by Associate Editor S. Ghosh.
The author is with the Department of Computer Architecture and Systems
Engineering, Faculty of Physics, Complutense University, 28040Madrid, Spain
(e-mail: jluimana@ucm.es).
Digital Object Identiﬁer 10.1109/TCSI.2015.2500419
in which a product matrix is introduced to combine the above
two steps together. The entries in this matrix can be computed
efﬁciently by sharing common items, known as subexpression
sharing [16]. Mastrovito multipliers using special irreducible
pentanomials have been widely studied due to their low-com-
plexity implementations [12], [13], [17], [18]. All these works
exploit subexpression sharing in order to ﬁnd efﬁcient architec-
tures. Other methods use the divide-and-conquer approach for
polynomial multiplication in order to reduce the complexity of
the multiplier [19], [21]. In [9], a new PBmultiplication method
was used. This method is based on the introduction of a product
matrix that can be decomposed as a sum of matrices depending
on the selected irreducible polynomial. Matrix decomposition
was already used in similar multiplication approaches
that exploit subexpression sharing [12], [13], [17], [18]. The
method in [9] introduced the functions and given by the
raw sum of terms and , where
are the coefﬁcients of two elements
and , respectively. The coefﬁcients of the product of two ﬁeld
elements can be computed as the sum of that functions. One of
the problems of the above method is related with the monolithic
construction of the and functions. For example, for
the functions
and are deﬁned. The sum of these two
functions would
result in a 3-level (with depth 3) binary tree of XOR gates.
However, the sum of involves the addition of four
product terms ( , , and ) and it could be done
with a 2-level complete binary tree of XOR gates.
In this work, a new bit-parallel PB multiplier is presented by
considering the functions and as a sum of and terms,
respectively, in such a way that
and for a given ﬁnite ﬁeld
, where and . The terms
and represent the addition of products and there-
fore can be implemented as a -level complete binary tree of
XOR gates. In this way, the addition of terms and with
the same superscript would result in a -level complete
binary tree. If the sum of the functions and is performed
by grouping the additions of terms with the same -level
and , then the number of XOR levels needed to compute
the product coefﬁcients can be reduced. Furthermore, the co-
efﬁcients and are given by the
binary representations of the subindex for and of the value
for , respectively. In this contribution, the new
multiplication approach is applied to type II irreducible pen-
tanomials [18] , with
. These pentanomials are important because
they are abundant (there are 597 values of in the interval
1549-8328 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
IE
EE
Pr
oo
f
2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 63, NO. 1, JANUARY 2016
[5,1000] such that these type of irreducible pentanomials of de-
gree exists) and because all ﬁve binary ﬁelds recommended
by NIST for ECDSA, i.e., , can
be constructed using such irreducible polynomials.
The paper is organized as follows. Notation and mathemat-
ical background are presented in Section II, where PB multipli-
cation for type II irreducible pentanomials given in [9] is also
reviewed. The new multiplication approach is presented in Sec-
tion III, where an example of multiplication and the complexity
analysis are also given. In Section IV comparisons with other
similar multipliers are done. Finally, concluding remarks are
made in Section V.
II. NOTATION AND PRELIMINARIES
Let be an irreducible polyno-
mial of degree over . All elements of the bi-
nary ﬁnite ﬁeld can be
represented in the polynomial basis ,
where is a root of the polynomial . For example,
an element is represented in PB as
,
with . The vector of the coefﬁcients of in PB
can be represented by . Let
and be their coefﬁcient vectors, respec-
tively. Using the method given in [9], the product
can be computed as , where
is the vector of reversed coefﬁcients of and is the
product or Mastrovito matrix that depends on and on the
coefﬁcients of . In order to compute the coefﬁcients,
a new notation was given in [9]. These coefﬁcients consist of
sum-of-products (SOP) given by the inner products of and .
An inner product can be represented by the permutation given
by the subscripts of the coefﬁcients of and , respectively,
in the SOP. From this permutation, 1-cycles and 2-cycles
, can be found and associated with the terms
and , respectively. For example, the SOP
can be represented by
the cycles (0,4)(1,3)(2). In [9], the sum of the and
terms represented by the 1-cycles and 2-cycles were
carried out by the functions and . These functions are
implemented as binary trees of 2-input XOR gates with a lower
level of 2-input AND gates (corresponding to the products
of the coefﬁcients of and ). The product can
be computed as the sum of these functions. The expression for
with , is [9]:
(1)
where only appears for odd. The expression for
with is as follows:
(2)
where the term only appears for ( and even) or for ( and
odd). In this case, . Otherwise, i.e., for ( even and
odd) or for ( odd and even), the term does not appear and
the value of . For example, for
the terms and are as follows: ,
, ,
,
TABLE I
EXPRESSIONS FOR THE COEFFICIENTS OF THE PRODUCT
,
, ,
, ,
, .
Type II irreducible pentanomials were deﬁned in [18] as
, for .
In this expression, if then a type I pentanomial [20] is
obtained. Polynomial basis multiplication for type II irreducible
pentanomials with was studied in [9]. The coefﬁcients
of the PB product were given as the sum of and terms,
as shown in Table I, where . In this table, the
coefﬁcients have been divided into seven sections (named from
to ), depending on the number of and terms in the
sums. The ﬁrst section (from to ) has 5 terms; section
with , and has 4, 7 and 6 terms, respectively;
section ( to ) has 8 terms; sections ( , )
and ( , ) have 7 and 6 terms, respectively; section
( to ) has 5 terms; and section has 4
terms. From (2), it can be observed that the term is given
by the addition of terms and the term (if it exists).
Therefore, performs the sum of the maximum number of
terms among ones and it presents the highest delay.
As this term appears in the coefﬁcient that is included in
section with the maximum number of terms, then is
the coefﬁcient with the highest delay of the multiplier. In the
following section, a new scheme for multiplication is given.
III. NEW MULTIPLIER FOR TYPE II IRREDUCIBLE
PENTANOMIALS
The functions and presented in (1), (2) are given by a
raw sum of terms and . The
coefﬁcients of the product of two ﬁeld elements represented in
PB can be computed as the sum of that functions, as given in
Table I. One of the problems of the above method is related
IE
EE
Pr
oo
f
IMAÑA: HIGH-SPEED POLYNOMIAL BASIS MULTIPLIERS OVER FOR SPECIAL PENTANOMIALS 3
with the monolithic construction of the and functions.
For example, for the functions
and are deﬁned.
The sum of these two functions
, where the terms in brackets point out that they
must be added (XOR) previously to the XOR with the other
terms, would result in a 3-level (with depth 3) binary tree of
XOR gates. However, the sum of involves the addition
of four product terms ( , , and ) so it could
be done with a 2-level complete binary tree of XOR gates if
the involved additions could be performed in a separate way,
i.e., if the product could be ﬁrst added with the term
and then perform the addition with in the form
.
Algorithm 1 Computation of initial terms of .
for to do
if odd then
;
else
;
end if
;
for to do
if then
; ;
else
;
end if
end for
end for
In this paper, a new bit-parallel PB multiplier is presented by
considering the functions and as a sum of and terms,
respectively, in such a way that
and for a given ﬁnite ﬁeld
, where and . The ini-
tial terms and represent the addition of products
and therefore can be implemented as a -level complete binary
tree of XOR gates. In this way, the addition of two terms and
with the same superscript would result in a new XOR in
the level (i.e., a new -level term) that represents a
-level complete binary tree. If the sum of the functions
and is performed by grouping the additions of terms with
the same -level and , starting with the lower levels, then
the number of XOR levels needed to compute the product co-
efﬁcients can be reduced. In this way, the 0-level initial terms
and should be ﬁrst added in pairs to give rise to a new
XOR in the level 1 (i.e., a new 1-level binary tree term), that
in turn should be added with other 1-level term to give rise to a
new 2-level complete binary tree and so on. If there is only one
-level term (or there is an unpaired -level term), then it should
be added with an immediately above -level term in order
to have a new -level tree. If no such a -level term
exists, then it should be added with a -level term, and so
on.
From (1), (2), the computations of the initial terms and
of and are given in Algorithm 1 and Algorithm 2,
respectively, where the term has been used.
In these algorithms, the condition in the
inner for loop determines if the or terms have an initial
term or at level . This condition will be further explained
in Section III-B.
Algorithm 2 Computation of initial terms of .
for to do
if (even and ) or (odd and ) then
;
else
;
end if
;
for to do
if
then
; ;
else
;
end if
end for
end for
A characteristic of the previous representation is that the co-
efﬁcients and are given by the
binary representations of the subindex for and of the value
for , respectively. This can be deduced from the ex-
pressions of and deﬁned in (1) and (2). For example, from
(1) it can be observed that is given by the sum of product
terms . As any number can be given as a sum of powers of
2, then can also be given as a sum of powers of 2 of product
terms . The addition of products was previously denoted
as . Therefore, in the notation , the co-
efﬁcients correspond with the binary represen-
tation of . A similar reasoning can be done for considering
that is given by the sum of product terms .
Furthermore, in order to reduce the number of XORs needed
for the computation of the product, common terms appearing
in several coefﬁcients can also be shared. These common terms
correspond to the addition of consecutive and terms, i.e.,
and , that lead to the addition of terms
and , respectively, for different levels de-
termined by the binary representations of the subindex (for )
IE
EE
Pr
oo
f
4 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 63, NO. 1, JANUARY 2016
and (for ). The addition of any pair of terms in level
creates a new term in level . From Table I, it can be noted
that for the coefﬁcients of the multiplier only common additions
can be found. Using the binary representations of
, it can be observed that for even , the common sums
are for , while that for odd ,
the common terms are for . The
occurrence of these common groups in the coefﬁcients of the
product is studied in Appendix B.
The algorithm for the computation of the new proposed mul-
tiplication is given in Algorithm 3. In the ﬁrst for loop, the
common terms to be shared are created, where
refers to that for even , the subindex ranges
from 0 to , while that for odd , ranges from 1 to .
For each coefﬁcient in Table I, the outer for loop processes (for
each level ) the initial and terms, cre-
ating new -level terms and sharing common terms (if any).
Thewhile loop processes terms from level to a level
with only two terms, in such a way that the maximum level
will be for the given coefﬁcient. The execution of the al-
gorithm will provide the coefﬁcients of the product. The above
new method of multiplication is clariﬁed with the following ex-
ample.
Algorithm 3 Computation of the product for .
compute and terms (using Algorithms 1 and 2)
for to do
ﬁnd -level terms and ;
create common terms in level ;
end for
for each coefﬁcient in Table I do
for to do
if -level terms and then
share common -level terms ;
end if
for the remaining -level terms do
sum -level terms in pairs to create -
level terms;
if a non-paired -level term then
consider the term as a -level term
end if
end for
end for
;
while the number of do
sum -level terms inpairs tocreate -level terms;
if a non-paired -level term then
consider the term as a -level term
end if
;
end while
end for
A. Multiplication Example over
Let us consider the product of two elements and
in generated by the type II irreducible pentanomial
. The and functions can be
computed using (1), (2) and are given in Table II. In this table,
the and functions are the sum of the and terms
in the -th row given in the second column. This column is di-
vided into four subcolumns labeled as , , , and that
represent the number of product terms involved in each
subcolumn. For example,
, where involves product term
involves the addition of terms
and
is the sum of product terms
. The term can then be
represented in the form
where , , and stand for the terms with , ,
and product terms, respectively. In this case, has not the
term because it is associated with the binary coefﬁcient 0.
The above representation is given in the third column in Table II,
where the binary coefﬁcients and
associated with the terms and , respectively, are given.
It can be noted that the binary coefﬁcients ordered in the form
for correspond with the binary representation
of the subindex while that the binary coefﬁcients
for correspond with the binary representation of .
The fourth column in Table II includes these binary represen-
tations. For example, the term corresponds with the binary
vector (1101) while that corresponds with (1011) that is the
binary representation of the value (in this
example with ). It can be observed that the and
terms given in Table II can also be computed using Algorithm
1 and Algorithm 2.
In order to reduce the space complexity of the multiplier,
common terms appearing in several coefﬁcients can also be
shared. It can be observed in Table II that consecutive and
terms have and terms with the same level . For ex-
ample, and have 1-level terms and and 3-level
terms and , respectively. The addition of and
then implies the sums and that give rise to
2-level and 4-level complete binary trees of XOR gates, respec-
tively. Therefore, the group given by the addition of these two
functions and can reduce the complexity. In Table II the
groups that can be found are represented by shadowed cells with
the same color. The groups are , , ,
, and , while that the groups
are , , , ), and
. The sum of the and terms that appear in the
coefﬁcients of the product must be done using the above groups
in order to optimize the implementation.
The coefﬁcients of the product are given in Table III using
Table I for this irreducible pentanomial. The previous
IE
EE
Pr
oo
f
IMAÑA: HIGH-SPEED POLYNOMIAL BASIS MULTIPLIERS OVER FOR SPECIAL PENTANOMIALS 5
TABLE II
AND FUNCTIONS FOR
and groups found in the product coefﬁcients are shad-
owed in Table III. It can be observed that only one term ap-
pears in each coefﬁcient, so only the groups are used. The
group appears in three coefﬁcients ( , and )
and the groups , , , and
appear in two coefﬁcients. This means that only one
of each of the above groups must be implemented and therefore
the other occurrences of the groups must not be implemented.
The number of XOR gates that can be reduced is given by the
number of terms in each group. Using Table II it can be ob-
served that the group involves the sum of two terms
and and therefore it requires 2 XOR gates.
In the same way, the groups , , ,
and require 2, 1, 2, 1, and 1 XOR gates,
respectively. Furthermore, as appears in three coefﬁ-
cients, then the number of XOR gates that can be reduced will be
2 times the number of XOR gates required, i.e., . There-
fore, the number of XOR gates that will be reduced for the above
groups will be, respectively, .
In Section III-B, general expressions will be given in order to
compute the number of XOR gates that can be reduced due to
the groups.
Using the and terms given in Table II for and , re-
spectively, the coefﬁcients of the product are shown in Table IV.
In this table, the sum of terms is accomplished using the rule
previously given, i.e., the sum of the functions and is per-
formed by grouping the sums of terms with the same -level
and , starting with the lower levels. In this way, the 0-level
initial terms and should be ﬁrst added in pairs to give rise
to new 1-level binary trees (1-level terms), that in turn should be
added in pairs with other 1-level terms to give rise to new 2-level
complete binary trees and so on. If there is only one -level term
(or there is an unpaired -level term), then it should be added
with an immediately above -level term in order to have a
TABLE III
COEFFICIENTS OF THE PRODUCT FOR
new -level tree. If no such a -level term exists, then
it should be added with a -level term, and so on. The order
of the additions of the terms in Table IV is represented by means
of parenthesis. To reduce the number of XORs needed for the
computation of the product, common terms appearing in several
coefﬁcients can also be shared. This is represented with shad-
owed boxes that correspond with the previously stated groups.
In order to illustrate the method, the implementation of the
coefﬁcient is given in Fig. 1. This coefﬁcient requires the
addition of 8 terms, so it is the most complex coefﬁcient and
determine the maximum delay of the multiplier (in fact, the
coefﬁcient of a multiplier given by Type II pen-
tanomials is the most complex one, so it is used to determine
the maximum delay complexity). In this ﬁgure, the and
terms are represented by ﬁlled circles. These circles correspond
to the initial and terms given in Table II in such a way that
IE
EE
Pr
oo
f
6 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 63, NO. 1, JANUARY 2016
TABLE IV
COEFFICIENTS OF THE PRODUCT FOR WITH AND TERMS
, , ,
, , ,
and . Vertical dashed lines rep-
resent the level of XOR binary trees. For example, the term
in line 2 represent the 2-level binary tree
. Circles enclosed within ellipses
represent the terms of the corresponding and functions.
For example, the function is given by the three initial terms
, and . Furthermore, the gray color XOR trees rep-
resent the group that can be shared in several coef-
ﬁcients (in this case, the group correspond with the additions
and ). It can be observed in Fig. 1 that the ad-
dition of terms follows the rule previously given, starting with
0-level terms and ascending in the construction of binary XOR
trees. For example, the addition of the initial 0-terms and
gives rise to a new 1-level term (a new XOR in level 1) that
in turn is added with the initial 1-term to give rise to a new
2-level XOR term and so on. In this example, can be con-
structed with a 6-level binary XOR tree so the delay complexity
of the multiplier is given by , where and rep-
resent the delay of 2-input AND and XOR gates, respectively.
The delay corresponds to the 0-level products of the
coefﬁcients of and . It must be noted that the best delay
complexity for this multiplier given by other similar methods in
the literature is .
The space complexity (number of AND and XOR gates) can
also be computed. The number of AND gates is given by all the
different products , with . This number
can be computed using (1), (2) and for this example
is given as 196 AND gates (see also Table II). It is proved in
Appendix B that the number of AND gates for a mul-
tiplier is . The XOR gates can be computed as the sum of
XOR gates in the initial and terms (as given in Table II)
plus the number of new XOR gates generated in the coefﬁcients
(as given in Table IV)minus the number of XOR gates due to the
groups shared among coefﬁcients. The and terms perform
the XOR of product terms, therefore the number of XORs is
. In this example, there are 7 , , and terms
each. Therefore the number of XOR gates in the initial terms
Fig. 1. Implementation of coefﬁcient .
will be
. There are also 7 terms and 6 , and
terms each, so the number of XOR gates in the initial terms
is .
The number of new XOR gates generated in the coefﬁcients for
the sum of and terms can be found in Table IV and in this
IE
EE
Pr
oo
f
IMAÑA: HIGH-SPEED POLYNOMIAL BASIS MULTIPLIERS OVER FOR SPECIAL PENTANOMIALS 7
case is 134 XOR. Finally the number of XORs due to the groups
shared among coefﬁcients were previously computed and it was
found to be 10 XOR. Therefore the total number of XOR of this
multiplier is .
B. Complexity Analysis
General expressions for time and space complexities for the
multiplier are given in this subsection.
1) Time Complexity: The coefﬁcients in Table I have been
divided into seven sections, depending on the number of and
terms in the sums. The ﬁrst section (from to ) has
5 terms; section with , and has 4, 7 and 6 terms,
respectively; section ( to ) has 8 terms; sections
and have 7 and 6 terms, re-
spectively; section ( to ) has 5 terms; and ﬁnally
section has 4 terms. It can be observed in Table II that
the term has the highest complexity among terms. The
coefﬁcient , that is included in section with themaximum
number of terms (8), includes this complex term . Therefore,
is the most complex coefﬁcient of a multiplier
given by Type II irreducible pentanomials and it will be used to
determine the highest delay of the multiplier.
In order to do that, the complexity of the and terms
must be determined. Their complexity depends on the number
of the initial and terms they have. These terms can be
represented as and
for a given ﬁnite ﬁeld , where
and . Therefore, the coefﬁcients , de-
termine if the corresponding , appear in and , respec-
tively. As previously proved, the coefﬁcients
and are given by the binary representations of
the subindex for and by the value for , re-
spectively. Therefore the study of the number of terms in
can be reduced to the study of terms in using the equiva-
lence (only in relation to the number of terms) .
For the most complex coefﬁcient , this equivalence results
in that , , ,
, , and ,
where , so will be equivalent to
.
The binary representation of , , , , ,
, and must then be determined. The binary
conﬁguration of a number can be given by the expression
(3)
The value determines if the binary representation
of has a 1 in the position with weight in such a way that
if has an even value, then has a 0 while that if
has an odd value, then has a 1. A 1 in the position with weight
for the binary representation of will represent that the terms
, have a term , that is the sum of
product terms and that is implemented by means of a binary
XOR tree with depth (an -level binary tree).
In order to compute the depth of the binary tree of XOR
gates in given by the coefﬁcient , the number
of total terms in the -level must ﬁrst be determined.
The initial levels for a given are . For
a given level , the number of new XOR terms that will result
in level due to the addition in pairs of the -level terms is
given by - . For example, in Fig. 1
there are seven 0-level terms
whose sum gives rise to the four 1-level XOR terms
besides the term , that can
also be considered as a 1-level term (in order to be added to
and result in a new 2-level XOR term).
Let be the number of initial terms and in level .
This number will be given by the terms in the previously
computed equivalent expression (in relation to the number of
terms)
. In order to do that, the binary repre-
sentation of , , , , , , , and
must then be determined. Using (3), the value
2 determines if the term has an initial term in the position
with weight , i.e., in level . Representing as
, then the number of initial terms and in level
(i.e., ) can be computed as follows:
(4)
It must be noted that in (4) the fourth addend
corresponds with the real term, while that the rest of
addends correspond with the equivalence previously given
. Using (4), the number of initial terms
and for the coefﬁcient given in the example in Sec-
tion III-A can be computed. The number of initial terms in
level 2, for example, will be
corresponding with the
initial terms , , , , ,
respectively, that are represented in Fig. 1 as ﬁlled (black and
gray) circles.
If denote the number of initial and
terms in levels , respectively, then the total
number of terms in the -level (denoted by )
will be the addition of the initial terms in that level
plus the terms created due to the addition of terms in lower
levels. In Section A of Appendix A, it is proved that these terms
created in level due to the addition of terms in levels
is given by the expression:
(5)
Therefore, the total number of terms in the
-level will be the sum of plus the expression
in (5). In Appendix A it is proved that this addition is:
(6)
The sum in pairs of the terms determined in (6)
will determine the ﬁnal level reached to compute the coef-
ﬁcient. Therefore the number of XOR levels needed to compute
this coefﬁcient will be . Finally, the
highest delay of the multiplier based on type II pen-
tanomials given by the coefﬁcient is:
(7)
IE
EE
Pr
oo
f
8 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 63, NO. 1, JANUARY 2016
In order to compare this time complexity with other multi-
pliers found in the literature, in Section B of Appendix A the
following upper bound is derived for the XOR delay of the mul-
tiplier:
(8)
2) Area Complexity: In order to determine the area com-
plexity of the PB multiplier given in Table I, the number of
AND and XOR gates of the and terms must be known.
In this work, these terms have been considered as a sum of
and terms, in such a way that and
for a given ﬁnite ﬁeld , where
and . In Table I, the coefﬁcients
of the product are given as sums of and terms where their
corresponding components and are considered as indi-
vidual terms when performing the sum. It can be observed that
the terms, , appears only once while that the
terms, , appear several times.
One way to determine the number of AND and XOR gates
of the and terms is to count the number of AND and
XOR gates given by the sum of terms and in (1), (2).
In this way, we compute the total number of AND gates of the
multiplier, the XORs of the and terms, the XORs needed
for the sum of all the terms of and the XORs needed for
one sum of the terms of , i.e., we count the XORs due to
the contribution of all the terms and of one occurrence of the
terms. If a term appears times in the additions given for
the coefﬁcients in Table I, then the other occurrences are
taken into account by computing the number of XORs needed
for the sum of the terms of and multiplying it by
. This must be done for each , . To
determine the area complexity of the PB multiplier, the number
of XOR gates needed for the sum of the and terms in the
product coefﬁcients of Table I and the number of shared groups
that appear in the product coefﬁcients should also be
computed. This number of groups must be subtracted from the
previous XOR gates computed. Therefore, the following ﬁgures
must be computed to obtain the XORs of the multiplier:
The number of XOR gates given by and in (1),
(2).
The number of XOR gates needed for the sum of the
and terms in the product coefﬁcients.
For each , the number of times that appears in
Table I and the number of XORs needed for the sum
of the terms for must be determined. Then the XOR
gates given by must be computed.
The number of XOR gates given by the shared groups
that appear in the product coefﬁcients.
The XOR gates of the multiplier will be . In
Appendix B the following values have been computed:
• The total number of AND gates of the multiplier is .
• The number of XOR gates given by and in (1),
(2) is .
• The number of XOR gates needed for the sum of the
and terms in the product coefﬁcients is .
• The number of XOR gates can be computed by
, where
the number of XOR gates needed for the sum of the
terms of is given by:
(9)
where is the Hamming Weight of and where
is given as:
(10)
• The number of XOR gates given by the shared groups
that appear in the product coefﬁcients is:
(11)
where represents the limit of the summatory
for even represents the limit for odd
represents the Hamming Weight of to be computed for
even and the Hamming Weight of for odd .
Therefore, the XOR gates of the multiplier given by the ad-
dition will be
(12)
A more compact expression for (12) could not be found. The
functions and could be computed for
any value of using Maple. In Table VII the values of these
functions for are given. Using Table VII, it can
be observed that for the example given in Section III-A with
, , the values and . In this
example, the values and can also be computed.
Applying the above values to (12) we have
gates, matching the result
given in Section III-A.
IV. COMPARISON WITH OTHER PB MULTIPLIERS
In Table V the theoretical complexities obtained by
the approach here proposed are compared with the best
results known to date for bit-parallel polynomial basis
multipliers over generated by type II irre-
ducible pentanomials. In (8) it was proved that
. It can also be ob-
served that , where
is the best XOR delay found in the literature
for this type of bit-parallel multipliers [9]. Simulations have
been done using Maple that have proved that the delay of
our multiplier is less than or equal to the delay in [9], i.e.,
. From the
simulation results, it was found that for the 593 different values
of in the interval for which an irreducible type
II pentanomial exists, the proposed multiplier has the smallest
delay in 465 different values of the ﬁeld size . More specif-
ically, among the type II irreducible pentanomials existent in
, there are 477 and 1162 different combinations
of for which the proposed multiplier has equal and
less delay, respectively, than the multiplier in [9]. With respect
to area complexity, it was found that the proposed multiplier
presents equal number of AND gates in comparison with the
other similar multipliers existing in the literature (except for
the approach presented in [21]) and a higher number of XOR
gates in comparison with the other multipliers. This increased
IE
EE
Pr
oo
f
IMAÑA: HIGH-SPEED POLYNOMIAL BASIS MULTIPLIERS OVER FOR SPECIAL PENTANOMIALS 9
TABLE V
COMPLEXITIES OF BIT-PARALLEL PB MULTIPLIERS FOR
(odd , even ), 3 (odd , odd ), 4 (even , even ) or 5 (even , odd ).
(odd , ), 45 (even ).
(even ). (odd ).
TABLE VI
COMPLEXITIES OF BIT-PARALLEL PB MULTIPLIERS USING TYPE II
PENTANOMIALS FOR THE FIVE RECOMMENDED NIST FIELDS
number of XOR gates is due to the separation of the monolithic
functions into the corresponding terms in order
to achieve a reduced delay for multiplication.
In Table VI the complexities of bit-parallel polynomial basis
multipliers using type II irreducible pentanomials for the ﬁve ﬁ-
nite ﬁelds with recom-
mended by NIST for ECDSA are presented. From the Table VI,
it can be observed that the multiplier here proposed presents the
lowest delay except for , that matches the best delay
given in [9].
TABLE VII
COMPUTED VALUES FOR and ,
V. CONCLUSIONS
High-speed algorithms and hardware architectures for com-
puting multiplication are highly required in several
applications, such as coding theory, computer algebra and cryp-
tography. In this paper, a new bit-parallel polynomial
basis multiplier for type II irreducible pentanomials with re-
duced time-complexity has been presented. The coefﬁcients of
the multiplier are computed as a sum of and functions
given by the addition of product terms of the coefﬁcients of the
two operands to be multiplied. In the new approach here pro-
posed, the sum of products in the and functions are sep-
arated into sums of product terms (corresponding to the ini-
tial and terms) that can be implemented as binary trees
of XOR gates with depth . The sum in pairs of binary trees
with the same depth, starting with the lower levels, leads to
a reduction of the time complexity of the multiplier. In this
paper, a complete multiplication example has been presented.
The theoretical complexity analysis has shown that the proposed
bit-parallel multiplier presents the lowest delay among the best
IE
EE
Pr
oo
f
10 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 63, NO. 1, JANUARY 2016
results known to date for similar polynomial basis multipliers
based on irreducible pentanomials. Simulations have been done
that have proved that for the 593 different values of in the
interval for which an irreducible type II pen-
tanomial exists, the proposed multiplier has the smallest delay
in 465 different values of the ﬁeld size . Furthermore, for
the ﬁve binary ﬁelds recommended by NIST for ECDSA, i.e.,
, the multiplier here proposed
presents the lowest delay except for , that matches the
best delay given in the literature.
APPENDIX A
TIME COMPLEXITY
A. Total Number of Terms in -level
Let denote the number of initial and
terms in levels , respectively. As previ-
ously stated, for a given level , the number of new XOR terms
that will result in level due to the addition in pairs of
the -level terms is given by .
Starting in level 0, then the new terms created in level 1 due
to the sum in pairs of the initial terms in level 0, , will be
. The total number of terms in level 1, denoted by ,
will now be . Using the property of modulo
operation for integer, then we have that
.
Next the new terms created in level 2 due to the sum in pairs of
the terms in level 1, , will be . Using the property of
modulo operation for positive inte-
gers and arbitrary real number , then the new XOR terms
created in level 2 will be
. The total number of terms in level 2, de-
noted by , will now be
. Proceeding
in the same way we will have that the new XOR terms cre-
ated in level due to the sum in pairs of the terms in
level , , will be
,
that is (5). Finally, the total number of terms in the
-level will be the sum of plus the expression
in (5), that is:
(13)
Now (4) can be used to simplify (13). In (4), the number of
initial and terms in level is given, where rep-
resents . The operator is deﬁned by the ex-
pression , for real . There-
fore
. Then in (4) can be rewritten as
, where
we have denoted the sum of the ﬁrst eight terms as and
the sum of the eight terms into the parenthesis as , so
. It can be observed that
. In a
similar way, we can then ﬁnd that
(14)
(15)
(16)
Using (14)–(16), the numerator of (13) can be simpliﬁed as
follows:
The term in (17) can be computed using the def-
inition previously given. According to that deﬁnition, it can be
observed that is the sum of eight floor functions
of eight quotients with the same denominator and
where the numerators are integers smaller than . However,
the value of is always greater than , so the quo-
tients are less than unity and all the floor functions are always
zero. Therefore, the term and (17) will be
.
Using this result and applying it to (13), it follows (18), that
matches (6).
(18)
B. Upper Bound for Delay
The delay of the multiplier given in (7) is the following:
(19)
For type II irreducible pentanomials, , so for
even we have that while that for odd we
have . The following operations can be done:
• Even . We have
.
Substituting this expression in the quotient in
and using the fact that , then we have
and therefore
. Finally we will have that
(20)
• Odd . We have
and then we get the same results as in the previous case for
even .
Using the result given in (20), then we have
and using
the property we have ﬁnally
IE
EE
Pr
oo
f
IMAÑA: HIGH-SPEED POLYNOMIAL BASIS MULTIPLIERS OVER FOR SPECIAL PENTANOMIALS 11
that the XOR delay of the multiplier can be upper bounded as
follows, matching (8):
(21)APPENDIX B
AREA COMPLEXITY
The XOR gates of the multiplier will be .
These quantities are determined as follows:
The functions and as given in (1), (2) are im-
plemented as binary trees of 2-input XOR gates with a
lower level of 2-input AND gates (corresponding to the
products). The number of AND and XOR gates for
are and , while that for they are and
, respectively [9]. The total contribution of
and to the space complexity is AND and
XOR gates [9]. Therefore, the total number of AND gates
of the multiplier is and the number of XOR gates
given by and in (1), (2) is .
The coefﬁcients in Table I have been divided into seven
sections (from to ). In Section II, the number of and
terms in the sums for each section was given. Taking
into account these numbers, then the XOR gates in the
product coefﬁcients are as follows [9]: for sec-
tion ; in ; in ; 12 and
10 in and , respectively; in ; and ﬁ-
nally 3 in . Then the number of XOR gates needed for
the sum of the and terms in the product coefﬁcients
is . It must be noted that this number corre-
sponds with the general case in which all the sections from
to appear in Table I. There are some special cases for
which the above complexity can be reduced. However, the
number of such cases is negligible and they are not consid-
ered.
In order to compute the number of XOR gates, we
must ﬁrst determine the number of times each ap-
pears in Table I. It can be found that for the general case in
which all the sections from to exist, there are
terms that appear 4 times, terms
and ) that appear 7 times and the term
appearing 6 times. As previously stated, one occurrence of
the terms is already included in , so wemust compute
the XOR gates due to appearing 3 times,
and appearing 6 times and ap-
pearing 5 times. If we deﬁne and
, where is the number of
XORs needed for the sum of the terms for , then
we can write that the number of XOR gates is given by
.
Using the equivalence (only in relation to the number of
terms) and denoting where
is the number of XORs needed for the sum of the
terms for , then we can compute the number of XOR
gates using .
The number of XOR gates for the sum of the terms
for can be computed using the number of 1's in the bi-
nary conﬁguration of . For example, in Table II is
given in the form and there-
fore 2 XOR gates are needed to perform the additions of
the terms. The binary conﬁguration of the subindex
13 in this case is (1101), i.e., with three 1's. Therefore
the number of XOR gates will be the number of 1's
in the binary conﬁguration of 13 minus 1. Using (3) and
using the deﬁnition of operator
, the Hamming Weight of , can be computed as
.
Therefore the number of XOR gates needed for the sum
of the terms for will minus 1:
(22)
that matches (9). Using (22), then can be computed:
(23)
that matches (10).
The number of XOR gates given by the shared groups
that appear in the product coefﬁcients can be
computed in a similar way as done in Section III-A. It
can be found that for even , the group
appears in three coefﬁcients ( , and ) while
that for odd , the group appears in the
same coefﬁcients. On the other hand, for even , the
groups appear
in two coefﬁcients while that for odd , the groups
also appear in
two coefﬁcients, in both cases excluding the previous
groups for even or odd. This means that only one
of each of the above groups must be implemented and
therefore the other occurrences of the groups must not be
taken into account. It must be noted that from the above
groups, the term with highest subindex gives
the number of XOR gates to be shared. For example, using
Table II it can be observed that for
(with three terms) and (two terms), the
group involves the sum of two terms
and and therefore it requires 2 XOR gates. In
order to compute the number of XORs, we must use the
equivalence (only in relation to the number of terms)
. Then the previous group
for even corresponds with , the group
for odd corresponds with , the
groups for even corre-
spond with and the groups
for odd correspond
with . Furthermore, using
the equivalence we have that the termwith lowest subindex
gives the number of XOR gates to be shared. Then the
number of XOR gates represented by the above shared
groups is given by the number of 1's (HammingWeight) in
the binary conﬁguration of for even or
of for odd plus the Hamming Weight
of for even or of for odd . Therefore the number
of XOR gates given by the shared groups that
appear in the product coefﬁcients is computed by
IE
EE
Pr
oo
f
12 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 63, NO. 1, JANUARY 2016
(24)
that matches (11). In (24), represents the limit of
the summatory for even represents the limit
for odd , represents the Hamming Weight of to be
computed for even and the HammingWeight of
for odd .
REFERENCES
[1] H. A. Curtis, A New Approach to the Design of Switching Circuits.
Princeton, NJ, USA: Van Nostrand, 1962.
[2] J. Lin, J. Sha, Z. Wang, and L. Li, “Efﬁcient decoder design for non-
binary quasicyclic LDPC codes,” IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 57, no. 5, pp. 1071–1082, May 2010.
[3] T.-C. Chen, S.-W. Wei, and H.-J. Tsai, “Arithmetic unit for ﬁnite ﬁeld
,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 3,
pp. 828–837, Apr. 2008.
[4] J. L. Imaña, “Low latency polynomial basis multiplier,”
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 5, pp. 935–946,
May 2011.
[5] M.A.Hasan andM. Ebtedaei, “Efﬁcient architectures for computations
over variable dimensional galois ﬁelds,” IEEE Trans. Circuits Syst. I,
Fundam. Theory Appl., vol. 45, no. 11, pp. 1205–1211, Nov. 1998.
[6] J. L. Imaña, R. Hermida, and F. Tirado, “Low complexity bit-parallel
polynomial basis multipliers over binary ﬁelds for special irreducible
pentanomials,” Integration, vol. 46, pp. 197–210, 2013.
[7] P. K. Meher, “Systolic and super-systolic multipliers for ﬁnite ﬁeld
based on irreducible trinomials,” IEEE Trans. Circuits Syst.
I, Reg. Papers, vol. 55, no. 4, pp. 1031–1040, May 2008.
[8] C. L.Wang and J. L. Lin, “Systolic array implementation of multipliers
for ﬁnite ﬁelds ,” IEEE Trans. Circuits Syst., vol. 38, no. 7,
pp. 796–800, Jul. 1991.
[9] J. L. Imaña, “Efﬁcient polynomial basis multipliers for Type II irre-
ducible pentanomials,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol.
59, no. 11, pp. 795–799, Nov. 2012.
[10] R. Azarderakhsh, D. Jao, and H. Lee, “Common subexpression algo-
rithms for space-complexity reduction of gaussian normal basis mul-
tiplication,” IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2357–2369,
May 2015.
[11] S. T. J. Fenn, M. Benaissa, and D. Taylor, “ multiplication
and division over the dual basis,” IEEE Trans. Comput., vol. 45, no. 3,
pp. 319–327, Mar. 1996.
[12] A. Reyhani-Masoleh andM.A. Hasan, “Low complexity bit parallel ar-
chitectures for polynomial basis multiplication over ,” IEEE
Trans. Comput., vol. 53, no. 8, pp. 945–959, Aug. 2004.
[13] T. Zhang and K. K. Parhi, “Systematic design of original and modi-
ﬁed mastrovito multipliers for general irreducible polynomials,” IEEE
Trans. Comput., vol. 50, no. 7, pp. 734–749, Jul. 2001.
[14] H. Fan and Y. Dai, “Fast bit parallel multiplier for all trino-
mials,” IEEE Trans. Comput., vol. 54, no. 4, pp. 485–490, Apr. 2005.
[15] E. D. Mastrovito, “VLSI architectures for multiplication over ﬁnite
ﬁelds ,” in Proc. 6th Int'l Conf. Appl. Algebra, Algebraic Al-
gorithms, Error-Correcting Codes (AAECC-6), New York, Jul. 1988,
pp. 297–309, Rome, Italy: Springer-Verlag.
[16] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Im-
plementation. Hoboken, NJ, USA: Wiley, 1999.
[17] A. Halbutogullari and Ç. K. Koç, “Mastrovito multiplier for general
irreducible polynomials,” IEEE Trans. Comput., vol. 49, no. 5, pp.
503–518, May 2000.
[18] F. Rodríguez-Henríquez and Ç. K. Koç, “Parallel multipliers based on
special irreducible pentanomials,” IEEE Trans. Comput., vol. 52, no.
12, pp. 1535–1542, Dec. 2003.
[19] V. B. Afanasyev, “Complexity of VLSI implementation of ﬁnite ﬁeld
arithmetic,” in Proc. II Int. Workshop Algebraic Combinatorial Coding
Theory, Sep. 1990, pp. 6–7.
[20] J. L. Imaña, R. Hermida, and F. Tirado, “Low complexity bit-parallel
multipliers based on a class of irreducible pentanomials,” IEEE
Trans. Very Large Scale Integr. (VLSI) Systems, vol. 14, no. 12, pp.
1388–1393, Dec. 2006.
[21] S.-M. Park, K.-Y. Chang, D. Hong, and C. Seo, “New efﬁcient bit-par-
allel polynomial basis multiplier for special pentanomials,” Integra-
tion, vol. 47, pp. 130–139, 2014.
José L. Imaña received the M.Sc. and Ph.D. degrees
in physics from Complutense University, Madrid,
Spain, in 1989 and 2003, respectively. He was an
Electronic Design Engineer at the Madrid Institute
of Technology, Spain. He is currently with the
Department of Computer Architecture and Systems
Engineering at Complutense University, where he
was promoted to an Associate Professor with tenure
in 2006. He has been the promoter and cofounder
of the International Workshop on the Arithmetic
of Finite Fields (WAIFI). His research interests
include algorithms and VLSI architectures for computations in ﬁnite ﬁelds,
cryptography, computer arithmetic, and reconﬁgurable computing.
