Fast bit-parallel binary multipliers based on type-I pentanomials by Imaña Pascual, José Luis
IEE
E P
ro
of
1 Fast Bit-Parallel Binary Multipliers Based
2 on Type-I Pentanomials
3 Jose L. Ima~na
4 Abstract—In this paper, a fast implementation of bit-parallel polynomial basis (PB)
5 multipliers over the binary extension field GF ð2mÞ generated by type-I irreducible
6 pentanomials is presented. Explicit expressions for the coordinates of the
7 multipliers and a detailed example are given. Complexity analysis shows that the
8 multipliers here presented have the lowest delay in comparison to similar
9 bit-parallel PB multipliers found in the literature based on this class of irreducible
10 pentanomials. In order to prove the theoretical complexities, hardware
11 implementations over Xilinx FPGAs have also been performed. Experimental
12 results show that the approach here presented exhibits the lowest delay with a
13 balanced Area Time complexity when it is compared with similar multipliers.
14 Index Terms—Multipliers, bit-parallel, GF ð2mÞ, polynomial basis, pentanomials
Ç
15 1 INTRODUCTION
16 EFFICIENT VLSI implementations of high-speed multipliers over
17 binary extension fields GF ð2mÞ are highly desirable for several
18 applications, such as cryptography, digital signal processing or
19 coding theory [1]. Elements in GF ð2mÞ are mainly represented in
20 polynomial basis (PB) because it provides more freedom on hard-
21 ware optimizations for arithmetic operations. The efficiency of
22 their hardware implementations is measured in terms of the num-
23 ber of 2-input gates (AND, XOR) and of the gate delays (TA, TX) of
24 the circuit. Many approaches and architectures have been pro-
25 posed to perform PB multipliers [2], [3], [4], [5], [6]. The complexity
26 of the multiplier mainly depends on the irreducible polynomial
27 fðyÞ selected for the finite field. For hardware implementations, tri-
28 nomials [7], [8], [9] and pentanomials are normally used. PB multi-
29 plication requires a multiplication of polynomials followed by a
30 modular reduction. Efficient bit-parallel multipliers can be imple-
31 mented using a product matrix that combine the above two steps
32 together [10], [11], [12], [13], [14]. A new PB multiplication method
33 based on the decomposition of a product matrix was used in [15].
34 This method introduced the functions Si and Ti given by sum of
35 terms xk ¼ ðakbkÞ and zji ¼ ðaibj þ ajbiÞ, where ai; bi 2 GF ð2Þ are
36 the coefficients of A;B 2 GF ð2mÞ. The coefficients of the product
37 can be computed as the sum of that functions. The above method
38 was applied in [15] to type I irreducible pentanomials, where groups of
39 shared subexpressions were determined in order to reduce the
40 area complexity of the multiplier. In [16], the sum of products
41 given in the Si and Ti functions were splitted into sums of 2
j prod-
42 uct terms that can be implemented as binary trees of XOR gates
43 with depth j. The sum in pairs of binary trees with the same depth
44 yields a reduction of the number of XOR levels needed to compute
45 the product coefficients. Furthermore, the use of binary trees of
46 XOR gates can minimize power consumption in comparison to the
47 use of linear arrays of XORs [17]. The multiplication approach
48 given in [16] was applied to type II irreducible pentanomials in the
49 form fðyÞ ¼ ym þ ynþ2 þ ynþ1 þ yn þ 1.
50In this paper, a new fast bit-parallel GF ð2mÞ polynomial basis
51multiplier is presented, where the splitting approach in [16] has
52been applied to general type I irreducible pentanomials and where the
53expressions of the product coefficients given in [15] for these penta-
54nomials have been simplified in order to obtain high-speed multi-
55pliers. Type I irreducible pentanomials fðyÞ ¼ ym þ ynþ1 þ yn þ yþ 1,
56where 2  n  bm=2c  1, are very important because they are
57abundant (there are 807 different m values in the interval ½8; 1000
58such that a type I irreducible pentanomial of degree m exists) and
59they are used in important applications. For example, arithmetic
60used in the Advanced Encryption Standard (AES) is based on the
61binary extension field GF ð28Þ generated by type I irreducible pen-
62tanomial fðyÞ ¼ y8 þ y4 þ y3 þ yþ 1. Furthermore, the three finite
63fields m 2 f163; 233; 283g from the five recommended by National
64Institute of Standards and Technology (NIST) for Elliptic Curve
65Digital Signature Algorithm (ECDSA) can be constructed using
66such pentanomials. The bit-parallel PB multiplier here presented
67has the lowest delay known to date for similar PB multipliers based
68on this type of irreducible pentanomials. In order to prove the theo-
69retical complexities, hardware implementations over Xilinx FPGAs
70have also been performed. NIST and SECG (Standards for Efficient
71Cryptography Group) recommended GF ð2mÞ multipliers have
72been described in VHDL and post-place and route implementation
73results in Xilinx Artix-7 have been reported. Experimental results
74show that the approach here presented exhibits the lowest delay
75with a balanced Area Time complexity when it is compared with
76similar multipliers.
77The paper is organized as follows. Section 2 provides notation
78and mathematical background. Type I irreducible pentanomials
79are introduced in Section 3, where new reduced expressions for
80multiplication are given. Section 4 describes the new multiplier,
81gives an example of multiplication and analyses the theoretical
82complexity. Comparisons with other similar multipliers are given
83in Section 5. Hardware implementation results are presented in
84Section 6. Finally, conclusions are given in Section 7.
852 BACKGROUND
86Let fðyÞ ¼Pmi¼0 fiyi be a monic irreducible polynomial of degreem
87over GF ð2Þ. The elements of the binary extension field GF ð2mÞ can
88be represented in the polynomial basis f1; x; . . . ; xm1g, where x is a
89root of the irreducible generating polynomial fðyÞ. Any element
90A 2 GF ð2mÞ is represented in PB as A ¼Pm1i¼0 aixi, where
91a0is 2 GF ð2Þ are the coefficients of A. In order to compute the coeffi-
92cients of the product C ¼ A B, a new method was used in [15].
93This method introduced the functions Si and Ti given by the sum
94of terms xk ¼ ðakbkÞ and zji ¼ ðaibj þ ajbiÞ, where ai; bi 2 GF ð2Þ are
95the coefficients of A and B, respectively. These functions are imple-
96mented as binary trees of 2-input XOR gates with a lower level of 2-
97input AND gates (corresponding to the aibj products). The product
98C ¼ A  B can be computed as the sum of these functions.
99The expressions for Si (1  i  m) and Ti (0  i  m 2) with
100& ¼ i=2b c and g ¼ m=2d e þ i=2b cð Þ, are [16]
Si ¼ x& þ
X&1
h¼0
zih1h ; Ti ¼ xg þ
Xh iþ1ð Þ
j¼1
zmjiþj ; (1)
102
103where x& ¼ a&b& only appears for i odd and xg only appears for (m
104and i even) or for (m and i odd). In this case, h ¼ g. Otherwise, i.e.,
105for (m even and i odd) or for (m odd and i even), the term xg does
106not appear and the value of h ¼ m=2d e þ i=2d eð Þ. For example,
107for GF ð25Þ the terms Si and Ti are as follows: S1 ¼ x0 ¼ a0b0,
108S2 ¼ z10 ¼ ða0b1 þ a1b0Þ, S3 ¼ x1 þ z20 ¼ a1b1 þ ða0b2 þ a2b0Þ, S4 ¼
109ða0b3 þ a3b0Þ þ ða1b2 þ a2b1Þ;S5 ¼ a2b2 þ ða0b4 þ a4b0Þ þ ða1b3 þ a3b1Þ;
 J. L. Ima~na is with the Department of Computer Architecture and Systems Engineering,
Faculty of Physics, Complutense University,Madrid 28040, Spain.
E-mail: jluimana@ucm.es.
Manuscript received 11 May 2017; revised 25 Nov. 2017; accepted 26 Nov. 2017. Date of
publication 0 . 0000; date of current version 0 . 0000.
(Corresponding author: Jose L. Ima~na.)
Recommended for acceptance by W. Liu.
For information on obtaining reprints of this article, please send e-mail to: reprints@ieee.
org, and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TC.2017.2778730
IEEE TRANSACTIONS ON COMPUTERS, VOL. 67, NO. X, XXXXX 2018 1
0018-9340 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht _tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
IEE
E P
ro
of110 T0 ¼ ða1b4 þ a4b1Þ þ ða2b3 þ a3b2Þ, T1 ¼ a3b3 þ ða2b4 þ a4b2Þ, T2 ¼ ða3b4 þ111 a4b3Þ, T3 ¼ a4b4.112 3 TYPE I IRREDUCIBLE PENTANOMIALS
113 Type I irreducible pentanomials were defined in [14] as fðyÞ ¼
114 ym þ ynþ1 þ yn þ yþ 1, for 2  n  bm=2c  1. These pentanomials
115 are very important because they are abundant and they are used in
116 a wide number of applications. For example, the specific type I
117 pentanomial fðyÞ ¼ y8 þ y4 þ y3 þ yþ 1 is used in the Advanced
118 Encryption Standard.
119 Polynomial basis multiplication for type I irreducible pentano-
120 mials was studied in [15], where expressions of the product coeffi-
121 cients were computed. In these expressions, groups G
j
i of
122 subexpressions given as sums of j terms Tk were also found. These
123 j-terms groups G
j
i can be shared among different coefficients lead-
124 ing to a reduction of area complexity of the multiplier. In this
125 work, it is observed that the common groups found in [15] can be
126 simplified in order to reduce the delay of the multiplier. The sim-
127 plification is shown in the following example with ðm;nÞ ¼ ð13; 3Þ.
128 3.1 GF ð213ÞMultiplier for fðyÞ ¼ y13 þ y4 þ y3 þ yþ 1
129 The product C ¼ A  B in GF ð213Þ generated by the type I irreduc-
130 ible pentanomial with parameters (m;n) = (13,3) can be computed
131 using the expressions given in [15]. The coefficients ci of the prod-
132 uct are c0 ¼ S1 þG30, c1 ¼ S2 þG60, c2 ¼ S3 þG50, c3 ¼ S4 þG30 þ
133 G32; c4 ¼ S5 þG21 þG60, c5 ¼ S6 þG22 þG50, c6 ¼ S7 þG23 þG32; c7 ¼
134 S8 þG24 þG21, c8 ¼ S9 þG25 þG22, c9 ¼ S10 þG26 þG23, c10 ¼ S11 þ
135 G27 þG24, c11 ¼ S12 þG28 þG25, c12 ¼ S13 þ T11 þG26. In the above
136coefficients, the 2-terms groups are given by the expressions [15]
137G20 ¼ ðT2 þ T11Þ, G21 ¼ ðT3 þ T4Þ, G22 ¼ ðT4 þ T5Þ, G23 ¼ ðT5 þ T6Þ,
138G24 ¼ ðT6 þ T7Þ, G25 ¼ ðT7 þ T8Þ, G26 ¼ ðT8 þ T9Þ, G27 ¼ ðT9 þ T10Þ,
139G28 ¼ ðT10 þ T11Þ, the 3-terms groups are given by G30 ¼ ðT0 þG27Þ,
140G31 ¼ ðT1 þG28Þ, G32 ¼ ðT3 þG20Þ, the 5-terms group is G50 ¼
141ðG20 þG31Þ, and the 6-terms group is G60 ¼ ðG30 þG31Þ. The coeffi-
142cients of this multiplier are given in Table 1, where a ci coordinate
143is the sum of the Sl and Tp terms in the ith row. In Table 1, the
144above G
j
i groups are not represented and only individual terms Tk
145are shown. It can also be observed that there are several Ti terms
146that are cancelled in some rows.
1473.2 New General Expressions for the Multiplier
148In a similar way to that seen in the previous example, the coordi-
149nates of the product C ¼ A B in PB for general type I pentano-
150mials fðyÞ ¼ ym þ ynþ1 þ yn þ yþ 1, with 2  n < bm=2c  1, are
151given in Table 2, where z ¼ m n. From the table, it can be
152observed that several Ti terms are cancelled, therefore reducing
153the complexity of the multiplier.
154The new general reduced expressions for the coordinates are
155also given in Table 3. In this table, the coefficients have been
156divided into eight sections (named from A to H ), depending on
157the terms Ti involved and on the number of Si and Ti terms in the
158sums for the coefficients. The number of terms in sectionsA ,B ,C
159,D , E , F ,G andH is 4, 5, 4, 7, 7, 6, 5 and 4, respectively. It can be
160observed that coefficients in sections D and E present the maxi-
161mum number of terms (seven). Furthermore, from equation (1), the
162term T0 is given by the addition of h 1 terms zji and the term xg
163(if it exists), i.e., it performs the sum of the maximum number of zji
164terms and therefore it presents the highest delay. From (1), the
165complexity of Ti terms decreases when subindex i increases, so the
166next most complex Ti term is T1. It must be noted that the coeffi-
167cient cnþ1 (in sectionE ) has the maximum number of terms (seven)
168and it includes the two most complex terms T0 and T1 in its sum.
169Therefore, cnþ1 is the most complex coefficient and it will be used
170in following sections to determine the delay of the new multiplier.
1714 NEW MULTIPLIER FOR TYPE I IRREDUCIBLE
172PENTANOMIALS
173As shown in Table 3, the coefficients of the product C ¼ A  B in PB
174can be computed as the addition of functions Si and Ti that are
175given in (1) by sum of terms xk ¼ ðakbkÞ and zji ¼ ðaibj þ ajbiÞ.
176However, the monolithic construction of Si and Ti terms can repre-
177sent a problem if low-delay implementations are required. For
178example, for GF ð25Þ, functions T1 ¼ x3 þ z42 ¼ ða3b3þ ða2b4 þ
179a4b2ÞÞ and T3 ¼ x4 ¼ a4b4 are defined. The addition T1 þ T3 ¼
TABLE 1
Coordinates ci of the Product for the Pentanomial
fðyÞ ¼ y13 þ y4 þ y3 þ yþ 1
TABLE 2
Coefficients ci of the Product for Type I Pentanomial fðyÞ ¼ ym þ ynþ1 þ yn þ yþ 1 with 2  n < bm=2c  1
2 IEEE TRANSACTIONS ON COMPUTERS, VOL. 67, NO. X, XXXXX 2018
IEE
E P
ro
of180 ðða3b3 þ ða2b4 þ a4b2ÞÞ þ a4b4Þ, where terms in brackets indicate181 that they must be added previously to the XOR with the other182 terms, results in a 3-level binary tree of 2-input XOR gates. How-183 ever, the addition T1 þ T3 involves the XOR of four product terms.184 This sum could be implemented with a 2-level complete binary185 tree of XOR gates if the additions could be done in a separate way,
186 i.e., if the product a3b3 could be first XORed with the term a4b4 and
187 then perform the XOR with ða2b4 þ a4b2Þ in the form T1 þ T3 ¼
188 ðða3b3 þ a4b4Þ þ ða2b4 þ a4b2ÞÞ.
189 In [16], a new approach was given by considering the functions
190 Si and Ti as an addition of S
j
i and T
j
i terms, respectively, in such a
191 way that Si ¼ sirSri þ    þ si0S0i and Ti ¼ tirTri þ    þ ti0T0i for
192 GF ð2mÞ, with sij; tij 2 GF ð2Þ and r ¼ blog2mc. The initial terms Sji
193 and T
j
i represent the sum of 2
j products akbl, so they can be imple-
194 mented as j-level complete binary trees of 2-input XOR gates. The
195 addition of two terms S
j
i and T
j
i with the same superscript j results
196 in a new 2-input XOR in level jþ 1 that represents a ðjþ 1Þ-level
197 binary tree. If the sum of Si and Ti functions is done grouping the
198 additions of terms with the same j-level S
j
i and T
j
i, starting with
199 lower levels, then the number of XOR levels needed to compute
200 the coefficients of the product can be reduced. The 0-level initial
201 terms S0i and T
0
i should be first added in pairs to give rise to a new
202 XOR in level 1, that in turn should be XORed with other 1-level
203 term to give rise to a new 2-level binary tree and so on. If there is
204 only one j-level term (or there is an unpaired j-level term), then it
205 should be XORed with a ðjþ 1Þ-level term in order to have a new
206 ðjþ 2Þ-level tree.
207 It can be noted that vectors ðsir; . . . ; si0Þ2 and ðtir; . . . ; ti0Þ2 are given
208 by the binary representations of the subindex i for Si and of the
209 value m 1 i for Ti, respectively [16]. Furthermore, common
210 terms appearing in several coefficients can be shared in order to
211 reduce the number of XORs. These common terms correspond to
212 the sums (Si þ Siþ1) and (Ti þ Tiþ1) that involve the additions
213 (Sli þ Sliþ1) and (Tli þ Tliþ1), respectively, for different levels l deter-
214 mined by the binary representations of i (for Si) and m 1 i (for
215 Ti). The notation T
lþ1
i;j = (T
l
i þ Tlj) and STlþ1i;j = (Sli þ Tlj) can be used
216 to represent the addition of two terms in level l to yield a new term
217 in level lþ 1. From Table 3, it can be observed that only common
218 additions (Ti þ Tiþ1) can be found (with i ¼ 0; . . . ;m 4 for evenm,
219 and with i ¼ 0; . . . ;m 4 for odd m) [16]. The following algorithm
220 for multiplication using the above approachwas given in [16]:
221 1) Compute S
j
i and T
j
i terms using (1).
222 2) For each level l ¼ 0 . . . r, create common terms Tlþ1i;iþ1.
223 3) For each coefficient of the multiplier:
224 a) For each level l ¼ 0 . . . r:
225 	 Share common terms Tlþ1i;iþ1.
226 	 Sum Ul terms in pairs to create Ulþ1 terms.
227 	 If 9 a non-paired Ul term, consider it as Ulþ1 .
228b) While the number of Ul terms 
 2:
229	 Sum Ul terms in pairs to create Ulþ1 terms.
230	 If 9 a non-paired Ul term, consider it as Ulþ1 .
231
232where Ul denotes T
l
i, S
l
i, T
l
i;j, S
l
i;j or ST
l
i;j terms at level l. In the next
233section, the representation introduced in [16] is applied to the new
234reduced expressions given in Table 3 for the type I pentanomial multi-
235plier with ðm;nÞ ¼ ð13; 3Þ.
2364.1 Type I Pentanomial Multiplier for ðm;nÞ ¼ ð13; 3Þ
237Let us consider the product C ¼ A  B inGF ð213Þ generated by type
238I pentanomial fðyÞ ¼ y13 þ y4 þ y3 þ yþ 1. Using equation (1), Si
239and Ti functions are given in Table 4 where Si and Ti are the XOR
240of the xk and z
j
i terms given in their rows. In this table, the columns
241labeled as 20, 21, 22 and 23 represent the number of product terms
242ahbl involved in each column. For example, S11 ¼ x5 þ z100 þ
243ðz91 þ z82 þ z73 þ z64Þ, where x5 involves 1 ¼ 20 product term (a5b5),
244the term z100 involves the XOR of 2 ¼ 21 terms ða0b10 þ a10b0Þ and
245ðz91 þ z82 þ z73 þ z64Þ is the sum of 8 ¼ 23 product terms ðða1b9 þ
246a9b1Þ þ ða2b8 þ a8b2Þ þ ða3b7 þ a7b3Þ þ ða4b6 þ a6b4ÞÞ. Term S11 can
247then be represented by S11 ¼ s113 S311þ s112 S211 þ s111 S111 þ s110 S011 ¼
2481:S311 þ 0:S211 þ 1:S111 þ 1:S011 where S311, S211, S111 and S011 stand for
249terms with 23, 22, 21 and 20 product terms, respectively. In this
250case, the not null terms S011 ¼ x5, S111 ¼ z100 and S311 ¼ ðz91 þ z82 þ
251z73 þ z64Þ. It can be observed that the binary vector ðs113 ; s112 ; s111 ; s110 Þ2
252¼ ð1; 0; 1; 1Þ2 ¼ 1110. This representation is given in the column
253labeled binary in Table 4, where it can be observed that the binary
254vector ðsi3; si2; si1; si0Þ2 for Si matches with the binary vector
255ðtj3; tj2; tj1; tj0Þ2 for Tj, with j ¼ m 1 i. For example, the term T1
256corresponds with the binary vector ð1011Þ2 that is the binary repre-
257sentation of the value 13 1 1 ¼ 11 (in this example with
258m ¼ 13). This fact is represented in Table 4 including terms Si and
259Tm1i in a row with the same binary representation.
260The space complexity of the multiplier can be reduced if com-
261mon terms that appear in several coefficients are shared. In Table 4,
262consecutive Si and Ti terms having S
j
i and T
j
i terms with the same
TABLE 3
New Reduced Expressions for the Coefficients of the Product
TABLE 4
Si andTi Functions for GF ð213Þ
IEEE TRANSACTIONS ON COMPUTERS, VOL. 67, NO. X, XXXXX 2018 3
IEE
E P
ro
of263 level j can be observed. For example, S6 and S7 have 1-level terms
264 S16 and S
1
7 and 2-level terms S
2
6 and S
2
7. The same applies to T5 and
265 T6, with (T
1
5, T
2
5) and (T
1
6, T
2
6) terms, respectively. The sum S6 þ S7
266 (and T5 þ T6) then implies the additions S16 þ S17 (T15 þ T16) and
267 S26 þ S27 (T25 þ T26) that give rise to 2-level and 3-level binary trees of
268 XOR gates, respectively. Therefore, the groups (S6 þ S7) and
269 (T5 þ T6) can reduce the complexity. The groups for this example
270 are represented in Table 4 by shadowed cells with same color. The
271 S groups are (S2;S3), (S4;S5), (S6;S7), (S8;S9), (S10;S11) and
272 (S12;S13), while that the T groups are (T1;T2), (T3;T4), (T5;T6),
273 (T7;T8) and (T9;T10).
274 Using Tables 1 and 3, the coefficients of the product for this
275 GF ð213Þ multiplier are given in Table 5, where the previous T
276 groups are shadowed. From Table 5, it can be observed that the
277 group (T9 þ T10) appears in three coefficients (c0, c3 and c10) while
278 that (T1 þ T2), (T3 þ T4), (T5 þ T6) and (T7 þ T8) are found in two
279 coefficients. Therefore, only one of each of these groups must be
280 implemented. The number of T
j
i terms in each group determines
281 the number of XOR gates that can be reduced. From Table 4, it can
282 be observed that the group (T1 þ T2) involves the addition of the
283 two terms (T11 þ T12) and (T31 þ T32), therefore requiring 2 XOR gates.
284 Likewise, (T3 þ T4), (T5 þ T6), (T7 þ T8) and (T9 þ T10) require 1, 2,
285 1, and 1 XOR gates, respectively. In addition, (T9 þ T10) can be
286 found in three different coefficients, so the number of XOR gates
287 that can be reduced will be 2  1 ¼ 2. Therefore, the number of XOR
288 gates that can be reduced by sharing is 2 + 1 + 2 + 1 + 2 = 8 XOR.
289 General expressions for the computation of the number of XOR
290 gates that can be reduced due to sharing of groups are given in
291 Section 4.2. Using the algorithm for multiplication previously
292 given [16] and using the S
j
i and T
j
i terms given in Table 4, the
293 coefficients of the product are shown in the third column of
294 Table 5. The precedence of the sums of terms in Table 5 is repre-
295 sented with parenthesis.
296 As stated in Section 3, coefficient cnþ1 is the most complex one
297 for Type I pentanomials. For GF ð213Þ, this coefficient corresponds
298 with c4, which implementation is given in Fig. 1. Using Table 5, it
299 requires the addition of 7 terms (including the most complex ones
300 T0 and T1), so it determines the maximum delay of the multiplier.
301 The initial Sli, T
l
i terms and the T
l
i;j, ST
l
i;j terms given in Table 5 are
302 represented in Fig. 1 by black and gray circles, respectively. For c4,
303 the initial terms are S5 ¼ S05 þ S25, T0 ¼ T20 þ T30, T1 ¼ T01 þ T11 þ T31,
304 T3 ¼ T03 þ T33, T4 ¼ T34, T9 ¼ T09 þ T19 and T11 ¼ T011, while that
305 terms ST15;1 ¼ S05 þ T01, ST35;0 ¼ S25 þ T20, T13;9 ¼ T03 þ T09 and
306 T43;4 ¼ T33 þ T34 are also represented. In Fig. 1, levels of XOR binary
307 trees are represented by horizontal dashed lines and Si and Ti
308functions are represented by ellipses enclosing their S
j
i and T
j
i
309terms, respectively. Furthermore, a gray square in level l represents
310a non-paired term in level l 1 that must be considered as l-level
311term in order to be XORed with another term in level l (for exam-
312ple, the non-paired term T011 is considered as a 1-level term to sum
313it with T19). The sharing group T
4
3;4 are also represented in the figure
314with a double gray circle.
315Time complexity of this GF ð213Þ multiplier can be computed
316taking into account that the most complex coefficient c4 requires a
3176-level binary XOR tree, so the delay is given by TA þ 6TX. The TA
318delay corresponds to the 0-level aibj products of the coefficients of
319A and B. For area complexity, the number of 2-input AND and
320XOR gates must be computed. The number of AND gates is given
321by the products aibj, with i; j 2 ½0;m 1, and for GF ð2mÞ multi-
322pliers is m2 [16]. Therefore, the number of AND gates is 169 for the
323GF ð213Þ multiplier. The number of XOR gates can be computed as
324the sum of XOR gates in the initial S
j
i and T
j
i terms (as given in
325Table 4) plus the number of new XOR gates generated in the coeffi-
326cients (as given in Table 5) minus the number of XOR gates due to
327shared groups. The S
j
i and T
j
i terms perform the XOR of 2
j product
328terms, so the number of XORs is 2j  1. In this example, there are 7
329S0i terms and 6 S
1
i , S
2
i and S
3
i terms, so the number of XOR gates in
330the initial S
j
i terms will be 7  ð20  1Þ þ 6  ð21  1Þþ 6  ð22  1Þ þ
3316  ð23  1Þ ¼ 66 XOR. There are also 6 T0i and T1i terms and 5 T2i
332and T3i terms, so the number of XORs in the initial T
j
i terms is
3336  ð20  1Þ þ 6  ð21  1Þ þ 5  ð22  1Þ þ 5  ð23  1Þ ¼ 56 XOR. The
334number of new XORs generated in the coefficients due to the addi-
335tion of S
j
i and T
j
i terms is found to be 110 (see Table 5). The number
336of XORs due to the shared groups were previously computed (8
337XOR). Therefore, the total number of XOR gates of this multiplier
338is 66 + 56 + 110  8 = 224 XOR.
3394.2 Complexity Analysis of the New Multiplier
3404.2.1 Time Complexity
341In Section 3.2 was found that cnþ1 is the most complex coefficient,
342so it is used to determine the delay of the new multiplier. To do
343that, the complexity of Si and Ti terms must be determined. As
344shown in Section 4.1, the number of initial terms S
j
i and T
j
i are
345given by the binary representations of the subindex i for Si and by
346the value m 1 i for Ti, respectively [16]. Therefore, the equiva-
347lence (only in relation to the number of terms) Ti  Sm1i can be
348used to determine the number of T
j
i terms in Ti. This equivalence
349determines that, for cnþ1, T0  Sm1, T1  Sm2, Tn  Sz1,
350Tnþ1  Sz2, Tz1  Sn and Tzþ1  Sn2, where z ¼ m n, so
TABLE 5
Coefficients of the Product for GF ð213Þ
4 IEEE TRANSACTIONS ON COMPUTERS, VOL. 67, NO. X, XXXXX 2018
IEE
E P
ro
of351 cnþ1  Snþ2 þ Sm1 þ Sm2 þ Sz1 þ Sz2 þ Sn þ Sn2. In order to352 determine the binary representation of these subindexes, the
353 expression q ¼Pblog2qci¼0 q=2ib cmod2ð Þ  2i giving the binary configu-
354 ration of a number q can be used [16]. The value q=2ib cmod 2 deter-
355 mines if the binary representation of q has a 1 in the position with
356 weight 2i, representing that Sq, Tm1q have a term Siq, T
i
m1q that
357 is the addition of 2i product terms and that is implemented with a
358 binary XOR tree of depth i. The depth of the binary XOR tree
359 implementing cnþ1 can be determined by first computing the total
360 number of terms in blog2mc-level. The initial levels are 0;
361 1; . . . ; blog2mc. For a given level i, the number of new XOR terms
362 created in level iþ 1 due to the sum in pairs of i-level terms is
363 dMi=2e, where Mi denotes the number of terms in level i. For
364 instance, in Fig. 1 there are five 0-level terms (S05, T
0
1, T
0
3, T
0
9, T
0
11)
365 and their sum results in the three 1-level terms ST15;1, T
1
3;9 and T
0
11
366 (considered as 1-level term to be XORed to T19). The number of ini-
367 tial terms S
j
i and T
j
i in level j, denoted as mj, is given by the binary
368 representations of the subindexes of the Si terms included in
369 cnþ1  Snþ2 þ Sm1 þ Sm2 þ Sz1 þ Sz2 þ Sn þ Sn2. Denoting
370 q	j ¼ q=2jb c mod 2, then mj can be computed as [16] mj =
371 ðm 1Þ	j þ ðm 2Þ	jþ ðz 1Þ	j þ ðz 2Þ	j þ ðnþ 2Þ	j þ n	j þ ðn 2Þ	j .
372 Using this expression, the number of initial terms for the most com-
373 plex coefficient c4 given in Section 4.1 can be computed. The num-
374 ber of initial terms in level 3, for example, will be m3 ¼ ð12Þ	3 þ
375 ð11Þ	3 þ ð9Þ	3 þ ð8Þ	3 þ ð5Þ	3þ ð3Þ	3 þ ð1Þ	3 ¼ 1þ 1þ 1þ 1þ 0þ 0þ 0 ¼
376 4 corresponding to T30  S312, T31  S311, T33  S39, T34  S38 (black
377 circles in Fig. 1).
378 The total number of terms Mblog2mc in the blog2mc-level is the
379 addition of initial terms mblog2mc plus the terms in that level created
380 by the XOR of terms in lower levels. Using the property of modulo
381 operation dqe þ n ¼ dq þ ne, with n integer, and having into
382 account that the total number of terms in level i is Mi ¼ mi þ
383 dmi1=2e, then it can be proved [16] that the terms created in level
384 blog2mc due to the addition in pairs of terms in level blog2mc  1 is
385 dðPblog2mc1i¼0 2imiÞ=2blog2mce. Therefore, the total number of terms in
386 blog2mc-level will be the sum of mblog2mc plus the above expression,
387 i.e., Mblog2mc ¼ dð
Pblog2mc
i¼0 2
imiÞ=2blog2mce. In order to compute this
388 expression, the number mj of initial terms in level j should be
389 known. This number was previously given for cnþ1. Using the fact
390 that mod operator is defined by x mod y ¼ x y x=yb c, for real
391 x; y (y 6¼ 0), then q	j ¼ q=2jb c  2  q=2jb c=2b c ¼ bq=2jc  2  bq=2jþ1c.
392 Therefore, it can be proved that [16]
Mblog2mc ¼
4mþ n 6
2blog2mc
 
: (2)
394
395
396The number of XOR levels needed to compute the coefficient
397cnþ1 will be blog2mc þ dlog2Mblog2mce, so the highest delay of the
398multiplier based on type I pentanomials is
TA þ blog2mc þ log2 4mþ n 6
2blog2mc
   
TX: (3) 400
401
4024.2.2 Area Complexity
403From (1), the number of AND gates in Si and Ti are i andm i 1,
404respectively. Therefore, the total number of AND gates of the multi-
405plier ism2. In order to compute the number of XORgates of themulti-
406plier, the number of XORs1 given by Si and Ti must be determined.
407From (1), these values are i 1 and m i 1, respectively, and
408therefore 1 is ðm 1Þ2. The number 2 of XOR gates used for the
409addition ofSi andTi terms in the product coefficients of Table 3must
410also be computed. Functions Si appear only once in Table 3 while
411that Ti terms appear several times. Therefore, the number of XORs in
412(1) determines the XORs of S
j
i and T
j
i terms, the XORs used for the
413addition of all S
j
i terms of Si and the XORs given in one sum of T
j
i
414terms of Ti. If a term Ti appears pi times in Table 3, then the other
415pi  1 occurrences are taken into account determining the numberQi
416of XORs needed for the addition of theT
j
i terms andmultiplying it by
417pi  1. Therefore, the XOR gates3 given by
Pm2
i¼0 ðpi  1Þ Qi must
418also be computed. Finally, the number 4 of XORs given by shared
419groups (Ti;Tj) should also be determined. The total number of XOR
420gates of themultiplierwill then be1 þ2 þ3 4 [16]. In Appendix
421the following values have been computed:
422 The number 2 of XORs needed for the sum of Si and Ti
423terms in product coefficients is 4mþ 2n 3.
424 The number3 of XOR gates can be given asPm2i¼0 ðpi  1Þ
425Qi ¼ 3 m1 þ 2 n þCn. In this expression, the number
426of XOR gates Cn needed for the sum of the S
j
n terms of Sn
427is Cn ¼ Hn  1 [16], where Hn is the Hamming Weight of n,
428and h ¼
Ph
i¼1Ci ¼
Ph
i¼1Hi  h.
429 The number 4 of XOR gates given by shared groups
430(Ti;Tj) in the product coefficients is ð
Pi
i¼k;kþ2;...HiÞ þ
431Hyn1 ¼ Dn þHy, where k ¼ 2bn=2c, i ¼ ðm 2Þ for even m
432and ðm 3Þ for odd m, and y represents that H only
433appears for odd n.
434Therefore, the number of XOR gates of the multiplier given by
4351 þ2 þ3 4 will be
m2 mþ 3Sm1 þ 2Sn þHn  Dn Hy; (4)
437
438where Sh ¼
Ph
i¼1Hi. Using (4), for the example given in Section 4.1
439withm ¼ 13, n ¼ 3, the values S12 ¼ 22, S3 ¼ 4,H3 ¼ 2, D3 ¼ 7 and
440H2 ¼ 1. Applying these values to equation (4) we obtain
441169 13þ 3  22þ 2  4þ 2 7 1 ¼ 224 XOR gates, matching the
442result given in Section 4.1.
4435 COMPARISON WITH OTHER MULTIPLIERS
444In Table 6 theoretical complexities of the multiplier here proposed
445are compared with the best results known to date for bit-parallel PB
446multipliers over GF ð2mÞ generated by type I irreducible pentano-
447mials. Simulations done with Maple have proven that the delay of
448ourmultiplier is less than or equal to the best delay given in [11] and
449[15], i.e., blog2mc þ log2 ð4mþ n 6Þ=2blog2mc
    3þ dlog2ðm
4501Þe. From these results, it was found that for the 807 different values
451of the field size m 2 ½8; 1000 for which a type I pentanomial exists,
452the proposed multiplier has the smallest delay in 762 different val-
453ues of m. Furthermore, among the 1974 ðm;nÞ combinations with
454m 2 ½8; 1000 for which type I pentanomials exist, there are 187 and
4551787 different pairs ðm;nÞ for which the proposed multiplier has
456equal and less delay, respectively, than the multipliers given in [11]
Fig. 1. Implementation of coefficient c4 for GF ð213Þ.
IEEE TRANSACTIONS ON COMPUTERS, VOL. 67, NO. X, XXXXX 2018 5
IEE
E P
ro
of
457 and [15]. With respect to area complexity, the proposed multiplier
458 presents equal number of AND gates (except for [18]) and a higher
459 number of XOR gates. This increased number is due to the splitting
460 of functions Si and Ti into S
j
i and T
j
i terms, respectively. In Table 7
461 the complexities of bit-parallel PB multipliers for NIST recom-
462 mendedGF ð2mÞ, withm 2 f163; 233; 283g, for which type I irreduc-
463 ible pentanomials exist are presented. It can be observed that the
464 multiplier here proposed presents the lowest delay among the dif-
465 ferent analyzed methods. These reductions range from 8.3 percent
466 forGF ð2283Þ to 9.1 percent forGF ð2163Þ andGF ð2233Þwith respect to
467 the best delays found in the literature.
468 6 HARDWARE IMPLEMENTATIONS
469 In order to further compare the new approach with other similar
470 methods, bit-parallel GF ð2mÞ PB multipliers based on type I irre-
471 ducible pentanomials have been described in VHDL, synthesized
472 and implemented on Xilinx FPGA Artix-7 XC7A200T-FFG1156.
473 Experimental results are those reported by Xilinx ISE 14.7 using XST
474 synthesizer. Furthermore, same pin assignments and speed high opti-
475 mizations have been part of the design methodology. Experimental
476 post-place and route results are given in Table 8 for multipliers
477 based on type I irreducible pentanomials for SECG [20] recom-
478 mended finite fields GF ð2mÞ, with ðm;nÞ = (113, 8), (113, 24), (113,
479 40), (131, 59), and for NIST (163, 59). Area complexity is expressed in
480 Table 8 in terms of the used number of LUTs and Slices, and time
481 results (in nanoseconds) represent the minimum time needed for
482 performing one GF ð2mÞ multiplication. The A T metrics express
483 area by time delay in Slices ns in order to compare the area and
484 delay (less is better). From the experimental results, it can be
485 observed that the new multiplier here proposed exhibits the lowest
486delay among the different methods. Moreover, the new approach
487presents the best Area Time values in three of the five imple-
488mented multipliers, therefore also showing a restrained area usage
489in comparisonwith othermethods.
4907 CONCLUSION
491In this paper, a new fast bit-parallel GF ð2mÞ polynomial basis multi-
492plier for type I irreducible pentanomials has been presented. Efficient
493implementations of high-speed multipliers over binary extension
494fields are highly desirable for several important applications. Fur-
495thermore, type I irreducible pentanomials are abundant and they are
496used in applications such as the AES. In this work, explicit expres-
497sions for the coordinates of the proposed multiplier are given. These
498expressions are implemented as the addition in pairs of binary trees
499of XOR gates with the same depth, leading to a reduction of delay.
500Moreover, the use of binary trees can minimize power consumption
501in comparison to the use of linear arrays of XOR gates. A detailed
502multiplication example has been also given. Theoretical complex-
503ity analysis has shown that the proposed multiplier presents the
504lowest delay among the best results known to date for similar
TABLE 6
Complexities of Bit-Parallel PB Multipliers for Type I Pentanomial fðyÞ ¼ ym þ ynþ1 þ yn þ yþ 1
#AND #XOR Delay
[13] m2 m2 þ 2m 3 TA þ ð6þ dlog2meÞTX
[18] 3m
2þ2m1
4
3m2þ24mþ8nþd
4 TA þ ð3þ dlog2ðmþ 1ÞeÞTX
[14] m2 m2 þmþ 2n TA þ ð3þ dlog2ðmÞeÞTX
[11] m2 m2 þm TA þ ð3þ dlog2ðm 1ÞeÞTX
[15] m2 m2 þm 1 TA þ ð3þ dlog2ðm 1ÞeÞTX
This work m2 m2 mþ 3Sm1 þ 2Sn þHn  Dn Hy TA þ blog2mc þ log2 4mþn62blog2mc
l ml m 	
TX
d ¼ 21 (odd n), 17 (even n). y = term included for odd n.
TABLE 7
Complexities of Bit-Parallel PB Multipliers Using Type I Pentanomials
for Three Recommended NIST Fields
#AND #XOR Delay ðm;nÞ
[13] 26,569 26,892 TA þ 14TX
[18] 20,008 21,028 TA þ 11TX
[14] 26,569 26,850 TA þ 11TX (163, 59)
[11] 26,569 26,732 TA þ 11TX
[15] 26,569 26,731 TA þ 11TX
This work 26,569 28,280 TA þ 10TX
[13] 54,289 54,752 TA þ 14TX
[18] 40,833 42,170 TA þ 11TX
[14] 54,289 54,572 TA þ 11TX (233, 25)
[11] 54,289 54,522 TA þ 11TX
[15] 54,289 54,521 TA þ 11TX
This work 54,289 56,471 TA þ 10TX
[13] 80,089 80,652 TA þ 15TX
[18] 60,208 61,888 TA þ 12TX
[14] 80,089 80,490 TA þ 12TX (283, 59)
[11] 80,089 80,372 TA þ 12TX
[15] 80,089 80,371 TA þ 12TX
This work 80,089 83,068 TA þ 11TX
TABLE 8
Comparison of Hardware Implementations for GF ð2mÞMultipliers
LUTS Slices TimeðnsÞ A T ðm;nÞ
[10] 5,554 2,882 24.74 71300.68
[19] 5,515 2,851 23.18 66086.18
[14] 5,434 2,718 22.89 62215.02 (113, 8)
[11] 5,427 2,571 21.23 54582.33 SECG
[15] 5,735 2,446 21.33 52173.18
This work 5,501 2,354 20.56 48398.24
[10] 5,529 2,727 21.99 59966.73
[19] 5,528 2,824 22.24 62805.76
[14] 5,436 2,406 21.32 51295.92 (113, 24)
[11] 5,435 2,546 22.15 56393.90 SECG
[15] 5,653 2,363 20.79 49126.77
This work 5,460 2,466 20.34 50158.44
[10] 5,533 2,662 22.10 58830.20
[19] 5,524 2,746 22.60 62059.60
[14] 5,455 2,488 21.13 52571.44 (113, 40)
[11] 5,431 2,508 21.15 53044.20 SECG
[15] 5,548 2,571 21.33 54839.43
This work 5,481 2,459 20.37 50089.83
[10] 7,423 2,743 23.07 63281.01
[19] 7,383 2,671 23.76 63462.96
[14] 7,308 2,168 20.95 45419.60 (131, 59)
[11] 7,286 2,287 22.96 52509.52 SECG
[15] 7,392 2,318 20.86 48353.48
This work 7,341 2,185 20.33 44421.05
[10] 11,412 3,852 24.29 93565.08
[19] 11,364 4,060 24.19 98211.40
[14] 11,290 3,664 21.79 79838.56 (163, 59)
[11] 11,304 3,730 22.62 84372.60 NIST
[15] 11,471 3,209 22.78 73101.02
This work 11,320 3,532 21.33 75337.56
6 IEEE TRANSACTIONS ON COMPUTERS, VOL. 67, NO. X, XXXXX 2018
IEE
E P
ro
of
505 multipliers based on this type of irreducible pentanomials. Simu-
506 lation results have proven that for the 1,974 ðm;nÞ combinations,
507 with m 2 ½8; 1000 and 2  n  bm=2c  1, for which type I irre-
508 ducible pentanomials exist, there are 187 and 1,787 different pairs
509 ðm;nÞ for which the proposed multiplier has equal and less delay,
510 respectively, than the best results found in the literature. Further-
511 more, for NIST recommended finite fields GF ð2mÞ with
512 ðm;nÞ ¼ ð163; 59Þ, ð233; 25Þ and ð283; 59Þ, the multiplier here pro-
513 posed presents a reduction of the delay ranging from 8.3 to 9.1
514 percent with respect to the best results known to date. In order to
515 prove the theoretical complexities, hardware implementations
516 over Xilinx FPGAs have also been performed. NIST and SECG
517 GF ð2mÞ multipliers have been described in VHDL and post-place
518 and route implementation results in Artix-7 have been reported.
519 Experimental results have shown that the proposed multiplier
520 exhibits the lowest delay with a balanced Area Time complexity
521 when compared with similar multipliers.
522 APPENDIX
523 AREA COMPLEXITY
524 In order to determine the number of XOR gates of the multiplier,
525 the quantities2 ,3 and4 must be computed.
526 The coefficients in Table 3 have been divided into eight sections.
527 The number of Si and Ti terms in each section was given in Section
528 3.2. The XOR gates in the product coefficients are the following: 3
529 in section A ; 4ðn 2Þ in B ;; 3 and 6 in C and D , respectively;
530 6ðn 2Þ in E ; 10 in F ; 4ðm 2n 2Þ in G and finally 3 in H .
531 Therefore, the number 2 of XOR gates needed for the addition of
532 Si and Ti terms in the product coefficients is 4mþ 2n 3.
533 The number pi of times each Ti appears in Table 3 must be deter-
534 mined to compute the number3 of XORgates. There are z 1 terms
535 (T0; . . . ;Tz2) that appear 4 times, n 2 terms (Tzþ1; . . . ;Tm2) that
536 appear 6 times and Tz1 appears 7 times. One occurrence of Ti terms
537 are already included in1 , so the number of XORs due to the above
538 terms appearing 3, 5 and 6 times, respectively, must be determined.
539 If we defineFa;b ¼
Pb
i¼aQi, whereQi is the number of XORs needed
540 for the sum of the T
j
i terms for Ti, then the number 3 of XORs is
541
Pm2
i¼0 ðpi  1Þ Qi ¼ 3 F0;m2 þ 2 Fz1;m2 þQz1. Using the
542 equivalence Ti  Sm1i and denotingh ¼
Ph
i¼1Ci, then3 can be
543 computed as
Pm2
i¼0 ðpi  1Þ Qi ¼ 3 m1 þ 2 n þCn. The XORs
544 Ci needed for the addition of S
j
i terms in Si can be determined using
545 the number of 1’s in the binary configuration of i [16]. For example,
546 S11 in Table 4 can be written as S11 ¼ S311 þ S111 þ S011 and therefore 2
547 XORs are needed to perform the additions of S
j
11 terms. Binary con-
548 figuration of subindex 11 is ð1; 0; 1; 1Þ2, with three 1’s, so the number
549 of XOR gatesC11 will be itsHammingWeightH11 minus 1. Therefore,
550 Ci ¼ Hi  1 andh ¼
Ph
i¼1Ci ¼
Ph
i¼1Hi  h.
551 Table 3 is used to compute the number of XOR gates given by
552 shared groups (Ti;Tj). It can be found that for even n, there are bz=2c
553 groups (Ti;Tiþ1), i 2 ½0; z 2 for evenm and i 2 ½1; z 2 for odd m,
554 that appear in two coefficients. For odd n, the group (Tz1;Tz) appears
555 in three coefficients while that dz=2e  1 groups (Ti;Tiþ1), i 2 ½0; z
556 3 for even m and i 2 ½1; z 3 for odd m, appear in two coefficients.
557 For these groups, the term with highest subindex gives the XORs to
558 be shared [16]. The XORs represented by the above groups are there-
559 fore given by the Hamming Weight of binary representation of the
560 lowest subindex i ofSi for each group. The quantities to be computed
561 are ðHn þHnþ2 þ    þHm2Þ for even n and m, ðHn þ Hnþ2 þ    þ
562 Hm3Þ for even n and odd m, ðHn1 þHnþ1 þ   þ Hm2Þ þHn1 for
563 odd n and evenm, and ðHn1 þHnþ1 þ    þ Hm3Þ þHn1 for odd n
564 andm. Therefore the number4 of XORs given by the shared groups
565 will be ðPii¼k;kþ2;...HiÞ þHyn1 ¼ Dn þHy, where k ¼ 2bn=2c, i ¼
566 ðm 2Þ for even m and ðm 3Þ for odd m, and y represents that H
567 only appears for odd n.
568ACKNOWLEDGMENTS
569This work has been supported by the EU (FEDER) and the Spanish
570MINECO, under grants TIN 2015-65277-R and TIN2012-32180.
571REFERENCES
572[1] J. Lin, J. Sha, Z. Wang, and L. Li, “Efficient decoder design for nonbinary
573quasicyclic LDPC codes,” IEEE Trans. Circuits Syst. I Reg. Papers, vol. 57,
574no. 5, pp. 1071–1082, May 2010.
575[2] K. Kobayashi and N. Takagi, “A combined circuit for multiplication and
576inversion in GF ð2mÞ,” IEEE Trans. Circuits Syst. II Exp. Briefs, vol. 55, no. 11,
577pp. 1144–1148, Nov. 2008.
578[3] B. Sunar and C¸.K. Koc¸, “Mastrovito multiplier for all trinomials,” IEEE
579Trans. Comput., vol. 48, no. 5, pp. 522–527, May 1999.
580[4] H. Wu, “Bit-parallel finite field multiplier and squarer using polynomial
581basis,” IEEE Trans. Comput., vol. 51, no. 7, pp. 750–758, Jul. 2002.
582[5] H. Fan, “A chinese remainder theorem approach to bit-parallel GF ð2mÞ
583polynomial basis multipliers for irreducible trinomials,” IEEE Trans. Com-
584put., vol. 65, no. 2, pp. 343–352, Feb. 2016.
585[6] H. Wu, “Bit-parallel polynomial basis multiplier for new classes of finite
586fields,” IEEE Trans. Comput., vol. 57, no. 8, pp. 1023–1031, Aug. 2008.
587[7] A. Halbutogullari and C¸. K. Koc¸, “Mastrovito multiplier for general irre-
588ducible polynomials,” IEEE Trans. Comput., vol. 49, no. 5, pp. 503–518,
589May 2000.
590[8] H. Fan and Y. Dai, “Fast bit parallel GF ð2mÞ multiplier for all trinomials,”
591IEEE Trans. Comput., vol. 54, no. 4, pp. 485–490, Apr. 2005.
592[9] P. K. Meher, “Systolic and super-systolic multipliers for finite field GF ð2mÞ
593based on irreducible trinomials,” IEEE Trans. Circuits Syst. I Reg. Papers,
594vol. 55, no. 4, pp. 1031–1040, May 2008.
595[10] E. D. Mastrovito, “VLSI architectures for multiplication over finite fields
596GF ð2mÞ,” in Proc. 6th Int’l Conf. Appl. Algebra Algebraic Algorithms Error-Cor-
597recting Codes, Jul. 1988, pp. 297–309.
598[11] A. Reyhani-Masoleh and M. A. Hasan, “Low complexity bit parallel archi-
599tectures for polynomial basis multiplication over GF ð2mÞ,” IEEE Trans.
600Comput., vol. 53, no. 8, pp. 945–959, Aug. 2004.
601[12] J. L. Ima~na, R. Hermida and F. Tirado, “Low complexity bit-parallel poly-
602nomial basis multipliers over binary fields for special irreducible
603pentanomials,” Integration, vol. 46, pp. 197–210, 2013.
604[13] T. Zhang and K. K. Parhi, “Systematic design of original and modified mas-
605trovito multipliers for general irreducible polynomials,” IEEE Trans. Com-
606put., vol. 50, no. 7, pp. 734–749, Jul. 2001.
607[14] F. Rodrıguez-Henrıquez and C¸.K. Koc¸, “Parallel multipliers based on spe-
608cial irreducible pentanomials,” IEEE Trans. Comput., vol. 52, no. 12,
609pp. 1535–1542, Dec. 2003.
610[15] J. L. Ima~na, R. Hermida, and F. Tirado, “Low complexity bit-parallel multi-
611pliers based on a class of irreducible pentanomials,” IEEE Trans. VLSI Syst.,
612vol. 14, no. 12, pp. 1388–1393, Dec. 2006.
613[16] J. L. Ima~na, “High-speed polynomial basis multipliers overGF ð2mÞ for spe-
614cial pentanomials,” IEEE Trans. Circuits Syst. I Reg. Papers, vol. 63, no. 1,
615pp. 58–69, Jan. 2016.
616[17] L. Song and K. K. Parhi, “Low-energy digit-serial/parallel finite field multi-
617pliers,” J. VLSI Signal Process., vol. 19, pp. 149–166, 1998.
618[18] S.-M. Park, K.-Y. Chang, D. Hong, and C. Seo, “New efficient bit-parallel
619polynomial basis multiplier for special pentanomials,” Integration, vol. 47,
620pp. 130–139, 2014.
621[19] B. Rashidi, R. R. Farashahi, and S. M. Sayedi, “Efficient implementation of
622low time complexity and pipelined bit-parallel polynomial basis multiplier
623over binary finite fields,” Int. J. Inform. Security, vol. 7, no. 2, pp. 101–114,
624Jul. 2015.
625[20] Certicom Research, “SEC 2: Recommended Elliptic Curve Domain Parame-
626ters”, Standards for Efficient Cryptography”. Version 1.0, Sep. 2000.
IEEE TRANSACTIONS ON COMPUTERS, VOL. 67, NO. X, XXXXX 2018 7
