Reconfigurable implementation of GF(2^m) bit-parallel multipliers by Imaña Pascual, José Luis
Reconfigurable implementation of GF (2m)
bit-parallel multipliers
Jose´ L. Iman˜a
Department of Computer Architecture and Automation
Faculty of Physics, Complutense University
28040 Madrid, Spain
Email: jluimana@ucm.es
Abstract—Hardware implementations of arithmetic operations
over binary finite fields GF (2m) are widely used in several
important applications, such as cryptography, digital signal
processing and error-control codes. In this paper, efficient
reconfigurable implementations of bit-parallel canonical basis
multipliers over binary fields generated by type II irreducible
pentanomials f(y) = ym + yn+2 + yn+1 + yn + 1 are presented.
These pentanomials are important because all five binary fields
recommended by NIST for ECDSA can be constructed using
such polynomials. In this work, a new approach for GF (2m)
multiplication based on type II pentanomials is given and several
post-place and route implementation results in Xilinx Artix-7
FPGA are reported. Experimental results show that the proposed
multiplier implementations improve the area×time parameter
when compared with similar multipliers found in the literature.
I. INTRODUCTION
Galois or finite fields GF (2m) have been widely studied due
to their use in important applications, such as cryptography,
digital signal processing and error control codes. These appli-
cations frequently require efficient hardware implementations
of GF (2m) arithmetic operations, of which multiplication is
the most complex and important one. Efficient multiplication
methods and architectures have been proposed for different
representation bases, where canonical or polynomial basis
is the most widely used. Apart from the basis selection,
the complexity of the multiplication also depends on the
defining irreducible polynomial f(y) selected for the field.
For GF (2m) hardware implementation, irreducible trinomials
and pentanomials are normally used.
Two-step classic polynomial basis multiplication in
GF (2m) involves a polynomial multiplication followed by a
reduction modulo an irreducible polynomial. An efficient bit-
parallel canonical multiplier was proposed by Mastrovito [1] in
which a product matrix combines the above two steps together
[2],[3],[4],[5]. A new polynomial basis multiplication scheme
was proposed in [6], where the decomposition of a product
matrix led to the introduction of Si and Ti functions given
by the sum of product terms. The addition of these functions
is used for the computation of the product of two GF (2m)
elements. The sum of products in Si and Ti were split in [7]
into sums of 2j product terms implemented as binary trees of
XOR gates with depth j. The addition in pairs of binary trees
with the same depth leads to a reduction of the multiplication
delay.
In this paper, efficient Xilinx FPGA implementations of
GF (2m) bit-parallel canonical basis multipliers based on type
II pentanomials f(y) = ym + yn+2 + yn+1 + yn + 1 are
presented. In order to optimize the synthesis, a new approach
for the product is considered in which the splitting of Si and
Ti terms is performed, but where the restriction imposed by
the addition in pairs of binary trees with the same depth
has been removed. In such a case, Xilinx XST tool has
freedom to optimize the synthesis of the multiplier. Several
GF (2m) multipliers, including the specific field GF (28),
have been described in VHDL and post-place and route
implementation results in Xilinx Artix-7 have been reported.
The field GF (28) is specially important because it has been
standardized for space communication by NASA and ESA
and used in CD players and Advanced Encryption Standard
(AES). Experimental results show that the proposed approach
for multiplication improves the area×time complexity when
compared with similar multiplication methods found in the
literature. Furthermore, the new approach also presents the
lowest delay for most of the here implemented multipliers.
II. BACKGROUND
Any element A of the binary field GF (2m) can be repre-
sented in the canonical or polynomial basis {1, x, . . . , xm−1}
as A =
∑m−1
i=0 aix
i, with ai ∈ GF (2), where x is a root
of the irreducible polynomial f(y) =
∑m
i=0 fiy
i. Canonical
basis multiplication in GF (2m) involves a polynomial mul-
tiplication followed by a reduction modulo the irreducible
polynomial. An efficient bit-parallel canonical multiplication
method was proposed by Mastrovito [1] in which a product
matrix combines the above two steps together. In [6], a
new GF (2m) polynomial basis multiplication approach was
used. In order to compute the product C = A · B, this
method introduced the functions Si and Ti given by the
addition of terms xk = (akbk) and z
j
i = (aibj + ajbi), with
ai, bi ∈ GF (2) being the coordinates of A,B ∈ GF (2m),
respectively. These functions are implemented as binary trees
of 2-input XOR gates with a lower level of 2-input AND gates
(corresponding to the aibj products). The expressions for Si
(1 ≤ i ≤ m) and Ti (0 ≤ i ≤ m− 2) are [6]:
Si = xp +
p−1∑
h=0
zi−h−1h , Ti = xq +
r−(i+1)∑
j=1
zm−ji+j (1)
where p = bi/2c and q = (dm/2e+ bi/2c). In (1), the term
xp = apbp only appears for i odd and xq only appears for
(m and i even) or for (m and i odd). In this case, r = q.
Otherwise, i.e., for (m even and i odd) or for (m odd and i
even), the term xq does not appear and r = (dm/2e+ di/2e).
For example, using (1), the terms Si and Ti for GF (28)
are: S1 = x0 = a0b0, S2 = z10 = (a0b1 + a1b0), S3 =
x1 + z
2
0 = a1b1 + (a0b2 + a2b0), S4 = z
3
0 + z
2
1 = (a0b3 +
a3b0) + (a1b2 + a2b1), S5 = x2 + z40 + z
3
1 = a2b2 + (a0b4 +
a4b0) + (a1b3 + a3b1), S6 = z50 + z
4
1 + z
3
2 = (a0b5 + a5b0) +
(a1b4 + a4b1) + (a2b3 + a3b2), S7 = x3 + z60 + z
5
1 + z
4
2 =
a3b3 + (a0b6 + a6b0) + (a1b5 + a5b1) + (a2b4 + a4b2), S8
= z70 + z
6
1 + z
5
2 + z
4
3 = (a0b7 + a7b0) + (a1b6 + a6b1) +
(a2b5+ a5b2)+ (a3b4+ a4b3), and T0 = x4+ z71 + z
6
2 + z
5
3 =
a4b4 + (a1b7 + a7b1) + (a2b6 + a6b2) + (a3b5 + a5b3), T1 =
z72 + z
6
3 + z
5
4 = (a2b7+a7b2)+(a3b6+a6b3)+(a4b5+a5b4),
T2 = x5 + z73 + z
6
4 = a5b5 + (a3b7 + a7b3) + (a4b6 + a6b4),
T3 = z74 + z
6
5 = (a4b7+ a7b4)+ (a5b6+ a6b5), T4 = x6+ z
7
5
= a6b6+(a5b7+a7b5), T5 = z76 = (a6b7+a7b6), T6 = x7 =
a7b7. The product C = A · B can then be computed as the
addition of these terms.
For hardware implementation of GF (2m) multiplication,
low Hamming weight irreducible polynomials, such as trino-
mials and pentanomials, are normally used. Type II irreducible
pentanomials [5] f(y) = ym + yn+2 + yn+1 + yn + 1, with
2 ≤ n ≤ bm/2c− 1, are important because they are abundant
and all five binary fields recommended by NIST for ECDSA
can be constructed using such irreducible polynomials. Canon-
ical basis multiplication for these pentanomials was studied in
[6], where expressions for the coefficients of the product were
given using Si and Ti terms. For the specific field GF (28)
generated by the polynomial f(y) = y8 + y4 + y3 + y2 + 1,
with (m,n) = (8, 2), the coefficients computed as in [6] are
given in Table I.
TABLE I
COEFFICIENTS OF THE PRODUCT FOR GF (28) WITH (m,n) = (8, 2).
c0 = S1 +T0 +T4 +T5 +T6;
c1 = S2 +T1 +T5 +T6;
c2 = S3 +T0 +T2 +T4 +T5;
c3 = S4 +T0 +T1 +T3 +T4;
c4 = S5 +T0 +T1 +T2 +T6;
c5 = S6 +T1 +T2 +T3;
c6 = S7 +T2 +T3 +T4;
c7 = S8 +T3 +T4 +T5;
One of the problems to reduce the delay of the product is
due to the monolithic construction of Si and Ti functions,
given by a sum of terms xk and z
j
i . For example, for GF (2
8)
the addition of S1 + T4 = a0b0 + (a6b6 + (a5b7 + a7b5))
would result in a 3-level binary tree of XOR gates. How-
ever, the sum of S1 + T4 involves the addition of four
product terms a0b0, a6b6, a5b7 and a7b5, so it could be
done with a 2-level complete binary tree of XOR gates if
the product a0b0 could be first added with a6b6 and then
perform the addition with (a5b7 + a7b5), in such a way that
S1 + T4 = (a0b0 + a6b6) + (a5b7 + a7b5). This idea was
used in [7] for GF (2m) polynomial multiplication based on
type II irreducible pentanomials. Functions Si and Ti were
split in the form Si = siρS
ρ
i + . . . + s
i
1S
1
i + s
i
0S
0
i and
Ti = t
i
ρT
ρ
i + . . . + t
i
1T
1
i + t
i
0T
0
i , with s
i
j , t
i
j ∈ GF (2) and
ρ = blog2mc. The terms Sji and Tji represent the addition of
2j products akbl and therefore can be implemented as a j-
level complete binary tree of XOR gates. The sum of any of
these j-level terms results in a new XOR in the level j + 1
representing a (j+1)-level complete binary tree of XOR gates.
If the sum of Si and Ti functions in the coordinates of the
product is done by grouping the additions of Sji and T
j
i terms
with the same level, starting with the lower ones, then the
number of XOR levels needed to compute the product of two
field elements can be reduced.
For GF (28), the expressions for the Si and Ti functions
given as the addition of Sji and T
j
i terms are as follows [7]:
S1 = 0 · S31 + 0 · S21 + 0 · S11 + 1 · S01 = S01, S2 = 0 · S32 +
0 · S22 + 1 · S12 + 0 · S02 = S12, S3 = S13 + S03, S4 = S24, S5 =
S25 + S
0
5, S6 = S
2
6 + S
1
6, S7 = S
2
7 + S
1
7 + S
0
7, S8 = S
3
8, and
T0 = T20 + T
1
0 + T
0
0, T1 = T
2
1 + T
1
1, T2 = T
2
2 + T
0
2, T3
= T23, T4 = T
1
4 +T
0
4, T5 = T
1
5, T6 = T
0
6. The expressions
of the corresponding Sji and T
j
i terms are given in Table II.
TABLE II
TERMS Sj
i
AND Tj
i
FOR GF (28).
S01 = x0 T
0
0 = x4
S12 = z
1
0 T
1
0 = z
7
1
S03 = x1 T
2
0 = (z
6
2 + z
5
3)
S13 = z
2
0 T
1
1 = z
7
2
S24 = (z
3
0 + z
2
1) T
2
1 = (z
6
3 + z
5
4)
S05 = x2 T
0
2 = x5
S25 = (z
4
0 + z
3
1) T
2
2 = (z
7
3 + z
6
4)
S16 = z
5
0 T
2
3 = (z
7
4 + z
6
5)
S26 = (z
4
1 + z
3
2) T
0
4 = x6
S07 = x3 T
1
4 = z
7
5
S17 = z
6
0 T
1
5 = z
7
6
S27 = (z
5
1 + z
4
2) T
0
6 = x7
S38 = (z
7
0 + z
6
1 + z
5
2 + z
4
3)
From Table II, it can be observed that the terms Sji and
Tji perform the addition of 2
j products so they can be
implemented as j-level complete binary trees of XOR gates.
Using these terms, expressions for the GF (28) multiplier
based on the splitting method introduced in [7] for type II
irreducible pentanomial are given in Table III, where the
notations Tk+1i,j = T
k
i + T
k
j and ST
k+1
i,j = S
k
i + T
k
j have
been used. Terms in parenthesis indicate that they must be
XORed previously to the XOR with the other terms in order
to reduce the delay. Furthermore, terms that appear in more
than one coefficient could be shared, therefore reducing the
space requirements (for example, the term T10,4 = T
0
0 + T
0
4
in Table III, that appears in the coefficients c0 and c2).
Theoretical complexities of bit-parallel multipliers based
on type II pentanomials were given in [7]. For the GF (28)
multiplier shown in table III, it can be found that the delay
complexity is TA + 5TX, with TA and TX representing the
TABLE III
COEFFICIENTS OF THE PRODUCT FOR GF (28) WITH SPLITTING.
c0 = ((S01 +T
1
0,4) +T
2
0) + (T
2
0,4 +T
2
5,6);
c1 = (ST22,1 +T
2
1) +T
2
5,6;
c2 = ((ST13,2 + S
1
3) +T
2
0) + ((T
1
0,4 +T
1
5) + (T
2
0,4 +T
2
2));
c3 = ((T20,1 + S
2
4) +T
3
0,1) + ((T
1
0,4 +T
1
4) +T
2
3);
c4 = (((ST15,0 +T
1
2,6) + S
2
5) +T
3
0,1) + (T
2
0,1 +T
2
2);
c5 = ST36,1 + ((ST
2
6,1 +T
0
2) +T
3
2,3);
c6 = ((ST17,2 + S
1
7) + S
2
7) + (T
3
2,3 + (T
0
4 +T
1
4));
c7 = S38 + (T
2
3 + (T
2
4,5 +T
0
4));
delay of 2-input AND and XOR gates, respectively. This
theoretical delay is the lowest one among similar GF (28)
multipliers, such as those given in [6] and [3], with delays
TA + 6TX and TA + 7TX, respectively. The space complexity
(number of 2-input AND and XOR gates) of the multiplier
given in table III was found to be 64 AND and 87 XOR gates.
In this case, the theoretical number of XOR gates is greater
than those found in [6] and [3], with 80 and 77 XOR gates,
respectively, while that the number of 2-input AND gates is
the same in all approaches.
III. FPGA EFFICIENT GF (28) POLYNOMIAL BASIS
MULTIPLIER
Expressions given in Table III for the coefficients of the
GF (28) polynomial basis multiplier impose hard restrictions
(given by the parenthesis) for the addition of the different
terms in order to reduce the number of XOR levels and
therefore reduce the delay of the multiplier. However, these
restrictions could not be efficient for a synthesis tool in order
to map that expressions into FPGA’s logic blocks. In such a
case, more freedom should be given to the synthesizer to find
an optimized implementation of the multiplier.
TABLE IV
NEW COEFFICIENTS OF THE PRODUCT FOR TYPE II GF (28).
c0 = S01 +T
2
0 +T
1
0 +T
0
0 +T
1
4 +T
0
4 +T
1
5 +T
0
6;
c1 = S12 +T
2
1 +T
1
1 +T
1
5 +T
0
6;
c2 = S13 + S
0
3 +T
2
0 +T
1
0 +T
0
0 +T
2
2 +T
0
2 +T
1
4 +T
0
4 +T
1
5;
c3 = S24 +T
2
0 +T
1
0 +T
0
0 +T
2
1 +T
1
1 +T
2
3 +T
1
4 +T
0
4;
c4 = S25 + S
0
5 +T
2
0 +T
1
0 +T
0
0 +T
2
1 +T
1
1 +T
2
2 +T
0
2 +T
0
6;
c5 = S26 + S
1
6 +T
2
1 +T
1
1 +T
2
2 +T
0
2 +T
2
3;
c6 = S27 + S
1
7 + S
0
7 +T
2
2 +T
0
2 +T
2
3 +T
1
4 +T
0
4;
c7 = S38 +T
2
3 +T
1
4 +T
0
4 +T
1
5;
In Table IV, new expressions for the coefficients of the
GF (28) polynomial basis multiplier are given. The splitting
of the Si and Ti functions as the addition of S
j
i and T
j
i terms
(given in Table II) has been used, but the restriction imposed
in the product by the parenthesized addition of terms with the
same j-level has been removed. The coefficients of the product
are then given as sums of Sji and T
j
i individual terms and the
synthesis tool is free to perform an optimized implementation
of the multiplier.
IV. FPGA IMPLEMENTATION RESULTS
The GF (28) polynomial basis multiplier given in Table IV
has been implemented in Xilinx Artix-7 XC7A200T-FFG1156.
The design entry has been behavioral VHDL and the experi-
mental results are those reported by Xilinx ISE 14.7 using XST
synthesizer. Furthermore, same pin assignments and speed
high optimizations have been part of the design methodology.
In order to compare the proposed GF (28) multiplier with other
similar approaches, VHDL descriptions of different multipliers
have also been implemented. The methods used for description
and comparison have been the Mastrovito approaches given in
[2] and [3], the bit-parallel version of the multiplier presented
in [8], the multiplier given in [6] that introduced the Si and
Ti functions, and the method with splitting Si/Ti functions
and hard parenthesized restrictions presented in [7].
Experimental post-place and route results obtained for
GF (28) multipliers are given in Table V, where the area
complexity is expressed in terms of the number of LUTs
and Slices used. Time results (in nanoseconds) represent the
critical path of the GF (2m) multipliers. The A×T metrics
express time delay by area in LUTs×ns in order to compare
the area and delay (less is better). From Table V, it can
be observed that the proposed multiplier exhibits the lowest
number of LUTs used and the lowest A×T metrics among
the different approaches, while the lowest number of slices
and delay correspond to the works given in [2] and [8],
respectively. In comparison with the splitting method with
parenthesized restrictions given in Table III, it can be observed
that the new approach is more area and time efficient.
The new approach used for the GF (28) multiplier has been
applied to the implementation of several type II irreducible
polynomial basis multipliers. Same design methodology and
methods used for comparison in GF (28) have been con-
sidered. The post-place and route results are also given in
Table V for type II multipliers with values (m,n) = (64,23),
(113,4), (113,34), (122,49), (139,59), (148,72), (163,66) and
(163,68), where binary fields GF (2113) are recommended by
SECG (Standards for Efficient Cryptography Group) [9] and
GF (2163) are recommended by NIST for ECDSA. From the
experimental results, it can be observed that the new approach
here proposed exhibits the best area×time values among the
different methods implemented for most of the binary fields
considered (except for NIST (163,68) and SECG (113,34),
where the multiplier given in [3] obtains the best values).
Furthermore, the new approach also presents the lowest delays
for most of the fields implemented (except for NIST (163,66)
and SECG (113,34), where the method introduced in [6] gets
lowest delays). With respect to area complexity, the multiplier
given in [3] presents the lowest number of LUTs in most cases.
There is not an specific method getting the lowest number
of slices. In comparison with the splitting method with hard
parenthesized restrictions given in [7], it can be observed
that the new approach is more area and time efficient in all
implemented fields. Therefore, the hard restrictions imposed
by the parenthesis for the addition of the different terms in
[7] in order to reduce the number of XOR levels made the
synthesizer could not perform an optimized mapping into the
FPGA’s logic blocks. This optimization could be done in
the non-parenthesized implementations given in Table V that
offer the synthesis tool more freedom to find an optimized
implementation of the GF (2m) multiplier.
TABLE V
COMPARISON OF GF (2m) MULTIPLIERS.
LUTs Slices Time (ns) A×T (m,n)
[2] 34 11 9.86 335.24
[8] 35 14 9.62 336.70
[3] 35 13 10.10 353.50 (8,2)
[6] 37 14 9.68 358.16
[7] 40 13 9.90 396.00
This work 33 12 9.77 322.41
[2] 1836 586 22.63 41548.68
[8] 1794 585 20.37 36543.78
[3] 1749 566 20.91 36571.59 (64,23)
[6] 1825 580 20.21 36883.25
[7] 1854 642 21.28 39453.12
This work 1769 541 20.18 35698.42
[2] 5747 2672 21.39 122928.33
[8] 5501 2864 23.29 128118.29
[3] 5424 2637 21.77 118080.48 (113,4)
[6] 5778 2469 21.28 122955.84 SECG
[7] 5944 2115 21.30 126607.20
This work 5420 2571 20.94 113494.80
[2] 5560 2849 23.58 131104.80
[8] 5505 2682 23.38 128706.90
[3] 5445 2563 20.84 113473.80 (113,34)
[6] 5813 2361 20.36 118352.68 SECG
[7] 5909 2073 21.73 128402.57
This work 5474 2507 21.59 118183.66
[2] 6487 3122 23.47 152249.89
[8] 6420 3045 23.75 152475.00
[3] 6305 2024 21.15 133350.75 (122,49)
[6] 6834 2287 21.83 149186.22
[7] 6858 1992 21.86 149915.88
This work 6361 1951 20.95 133262.95
[2] 8370 3511 23.54 197029.80
[8] 8301 3915 23.77 197314.77
[3] 8139 2657 21.63 176046.57 (139,59)
[6] 8900 2960 22.29 198381.00
[7] 8998 3031 21.55 193906.90
This work 8222 2543 21.35 175539.70
[2] 9466 3888 25.27 239205.82
[8] 9406 3804 23.91 224897.46
[3] 9252 3156 21.98 203358.96 (148,72)
[6] 9996 3329 22.40 223910.40
[7] 9943 3112 22.31 221828.33
This work 9314 3104 21.76 202672.64
[2] 11425 4053 25.20 287910.00
[8] 11379 4433 23.52 267634.08
[3] 11179 3361 23.66 264495.14 (163,66)
[6] 12155 4056 22.48 273244.40 NIST
[7] 12293 4015 22.95 282124.35
This work 11295 3621 22.77 257187.15
[2] 11422 4205 24.20 276412.40
[8] 11379 4349 24.01 273209.79
[3] 11172 3105 22.40 250252.80 (163,68)
[6] 12187 3876 22.83 278229.91 NIST
[7] 12334 4430 23.82 293795.88
This work 11330 3697 22.39 253678.70
V. CONCLUSION
In this work, efficient Xilinx Artix-7 FPGA implementations
of GF (2m) bit-parallel canonical basis multiplier based on
type II irreducible pentanomials have been presented. These
pentanomials are important because they are abundant and all
five binary fields recommended by NIST for ECDSA can be
constructed using such irreducible polynomials.
A new approach for the computation of the product co-
efficients has been considered. It is based on the use of Si
and Ti functions that can be split into sums of product terms
implemented as complete binary trees of XOR gates with
different depths. The addition of binary trees with the same
depth can reduce the delay of the multiplier. However, this
restriction could not be efficient for a synthesis tool to map that
expressions into FPGA’s logic blocks. In order to optimize the
synthesis of the multiplier, in this work the splitting of Si and
Ti functions has been used, but the restriction imposed by the
addition of binary trees with the same depth has been removed.
In this case, Xilinx XST tool had freedom to optimize the
synthesis of binary field polynomial basis multipliers.
In order to illustrate the new approach, a specific example
for GF (28) has been given. Furthermore, several GF (2m)
multipliers have been described in VHDL and post-place and
route implementation results in Artix-7 have been reported.
Experimental results have shown that the proposed GF (2m)
multiplier implementations improve the area×time parameter
when compared with similar multipliers found in the literature.
Furthermore, the new approach also presents the lowest delay
for most of the binary fields used for implementation.
ACKNOWLEDGMENT
This work has been supported by the EU (FEDER) and
the Spanish MINECO, under grants TIN 2015-65277-R and
TIN2012-32180.
REFERENCES
[1] E.D. Mastrovito, “VLSI Architectures for Multiplication Over Finite
Fields GF (2m)”, Applied Algebra, Algebraic Algorithms, and Error-
Correcting Codes, Proc. Sixth Int’l Conf., AAECC-6, New York: Springer-
Verlag, Rome, pp. 297-309, July 1988.
[2] C. Paar, “Efficient VLSI Architectures for Bit Parallel Computation in
Galois Fields”, PhD Thesis, Universita¨t GH Essen, 1994.
[3] A. Reyhani-Masoleh and M.A. Hasan, “Low Complexity Bit Parallel
Architectures for Polynomial Basis Multiplication over GF (2m)”, IEEE
Trans. Computers, vol. 53, no. 8, pp. 945-959, August 2004.
[4] T. Zhang and K.K. Parhi, “Systematic Design of Original and Modified
Mastrovito Multipliers for General Irreducible Polynomials”, IEEE Trans.
Computers, vol. 50, no. 7, pp. 734-749, July 2001.
[5] F. Rodrı´guez-Henrı´quez and C¸.K. Koc¸, “Parallel Multipliers Based on
Special Irreducible Pentanomials”, IEEE Trans. Computers, vol. 52, no.
12, pp. 1535-1542, December 2003.
[6] J.L. Iman˜a, “Efficient polynomial basis multipliers for Type II irreducible
pentanomials”, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 59, no. 11,
pp. 795-799, November 2012.
[7] J.L. Iman˜a: ‘High-Speed Polynomial Basis Multipliers over GF (2m)
for Special Pentanomials’, IEEE Trans. Circuits and Systems I-Regular
Papers, vol. 63, no. 1, pp. 58-69, January 2016.
[8] B. Rashidi, R.R. Farashahi and S.M. Sayedi, “Efficient Implementation
of Low Time Complexity and Pipelined Bit-Parallel Polynomial Basis
Multiplier over Binary Finite Fields”, Int. Journal of Information Security,
vol. 7, no. 2, pp. 101-114, July 2015.
[9] SEC 2. Standards for Efficient Cryptography Group, “Recommended
Elliptic Curve Domain Parameters”. Version 1.0, 2000.
