Serial multiplier architectures over GF(2n) for elliptic curve cryptosystems by Batina, L. et al.
PDF hosted at the Radboud Repository of the Radboud University
Nijmegen
 
 
 
 
The following full text is a preprint version which may differ from the publisher's version.
 
 
For additional information about this publication click this link.
http://repository.ubn.ru.nl/handle/2066/127477
 
 
 
Please be advised that this information was generated on 2017-03-09 and may be subject to
change.
Serial Multiplier Architectures over GF (2n)
for Elliptic Curve Cryptosystems
Lejla Batina, Nele Mentens, Sıddıka Berna O¨rs, Bart Preneel
Katholieke Universiteit Leuven, ESAT/SCD-COSIC
Kasteelpark Arenberg 10
B-3001 Leuven-Heverlee, Belgium
{Lejla.Batina,Nele.Mentens,Siddika.BernaOrs,Bart.Preneel}@esat.kuleuven.ac.be
Abstract—We present an FPGA implementation of a
new multiplier for binary finite fields that combines two
previously known methods. The multiplier is designed for
polynomial bases which allow more flexibility in hardware
and is dedicated to efficient implementations of elliptic curve
cryptography. An extension to a digit-serial architecture
is also sketched. For the introduced architecture we also
discuss resistance to side-channel attacks.
I. INTRODUCTION
Most public key cryptosystems, including Elliptic
Curve Cryptosystems (ECC), heavily rely on arithmetic
operations in finite fields, and more in particular on a
finite field multiplier. This article proposes an efficient
FPGA implementation of a new serial multiplier that
combines two previously known methods, the classical
and Montgomery’s modular multiplication algorithm. We
require no special properties for the irreducible poly-
nomial defining the finite field. This design can handle
arbitrary bit-lengths from 160 to 300 bits, which are
suitable for applications of ECC. Several algorithms and
architectures for multiplication in GF (2n) have been
proposed [4], [23]. In the context of an EC processor
for binary fields, multipliers have been discussed in [1],
[22], and recently in [8], [9], [17]. The first bit serial
multiplier is discussed by Beth and Gollmann [4]. This
multiplier is using convolution and reduction modulo an
irreducible polynomial taking n clock cycles to compute
a multiplication. One of the earliest implementations is
[1], wherein the authors described an efficient implemen-
tation of ECC over GF(2155) in optimal normal basis.
Although normal basis representations usually result in
efficient implementations the major disadvantage is that
these bases do not offer a scalable and flexible platform.
The implementations in [7], [22] also use normal basis
representation. In [17], a complete processor architecture
for elliptic curve cryptosystems over GF(2n) in polyno-
mial bases is proposed. The proposed architecture is also
scalable with separated squarer (bit-parallel) and multi-
plier (digit-serial). Goodman and Chandrakasan proposed
a domain-specific reconfigurable cryptographic processor
(DSRCP) in [8]. The DSRCP performs a variety of
algorithms ranging from modular integer arithmetic to
elliptic curve arithmetic over finite field. All operations
are universal and they can be performed using any n-
bit modulus (8 ≤ n ≤ 1024), irreducible polynomial
and non-supersingular elliptic curve over GF(2n). The
various complex modular arithmetic operations (multi-
plication, reduction, inversion and exponentiation) are
implemented using microcode, while simple operations
(addition and subtraction) are implemented directly in
hardware. Multiplication is performed using Montgomery
multiplication [16]. The multiplier is doing a bit-serial
processing, so this design would be suitable for low power
devices. The work of Kitsos et al [20] presented the MSB-
first, bit-serial multiplier for binary fields that also features
a flexibility and low hardware complexity. The multiplier
presented here aims also to wireless applications and is
using the method of Montgomery in combination with the
classical method.
For a detailed survey on finite-field multipliers for
Public Key Cryptography see [3].
However, very few efficient hardware implementations
present a completely generic solution which allows an
arbitrary choice for all parameters: field, basis represen-
tation, irreducible polynomial, bit-lengths etc. The flexi-
bility of our architecture forms an important advantage for
cryptographic applications. More precisely, the architec-
ture introduced here is scalable to every desired bit-length
and it allows any choice of binary field, coordinates or
irreducible polynomial. These properties make the pro-
posed multiplier as a suitable choice for all applications
of ECC.
II. BACKGROUND INFORMATION
Public key cryptosystem allow secure communications
over insecure channels without prior agreement of a
shared secret; they also enable digital signatures. ECC
were proposed in the mid 1980’s [10], [15]; they are
becoming increasingly popular in the last years, which can
be illustrated by their presence in cryptographic standards.
Common fields to implement ECC are GF(p) (for a prime
p with at least 160 bits) and GF (2n) (n ≥ 160). ECC
rely on a group structure induced on the elliptic curve.
A set of points on an elliptic curve (with one special
point added, the so-called point at infinity) together with
a point addition as a binary operation has the structure
of an Abelian group. It is then natural to define a point
or scalar multiplication as a multiplication of an arbitrary
point on an elliptic curve with some integer; this oper-
ation is the basic operation for cryptographic protocols.
Efficient implementations of a point multiplication rely
on a modular multiplication in the underlying finite field.
More details on ECC can be found in [5], [11], [13].
This paper deals with polynomial bases, where the
elements of GF (2n) are polynomials of degree at most
n − 1 over GF(2), and arithmetic is carried out mod-
ulo an irreducible polynomial f(x) of degree n over
GF(2). In this case the basis elements have the form
1, ω, ω2, . . . , ωn−1 where ω is a root in GF (2n) of the
irreducible polynomial f(x) of degree n over GF(2).
According to this representation an element of GF (2n) is
a polynomial of length n and can be written as: a(x) =∑n−1
i=0 aix
i = an−1xn−1 + an−2xn−2 + . . . + a1x + a0,
where ai ∈ GF(2).
A point addition (or doubling) requires one field inver-
sion (I), two field multiplications (M) and one squaring
(S), or 1I+2M+1S. Here, we consider squaring as a spe-
cial case of multiplication; by using projective coordinates
the field inversion can be replaced by multiplications [5].
The addition of two field elements requires the modulo
2 addition of the coefficients of the elements. Classical
modular multiplication is typically based on the following
equation:
a(x)·b(x)=(an−1xn−1+· · ·+a1x+a0)·b(x) mod f(x)
=(· · · (an−1b(x)x+an−2b(x))x+· · ·
+a1b(x))x+a0b(x) mod f(x)
(1)
which illustrates the Horner scheme for multiplication.
It is the base of the Most Significant Bit-First (MSB)
multiplier [4].
Another possibility to calculate the product of two
polynomials in GF (2n) is Montgomery’s multiplication
algorithm as proposed in [6]. Here we define the MMM
as: MMM [(a(x), b(x)] := a(x) · b(x) · [r(x)]−1 mod
modf(x). Before a sequence of operations can be
started, all operands have to be converted to the form
a(x)r(x) mod modf(x), the so-called M -residue of the
operand. This is done by performing a Montgomery
multiplication of the element and [r(x)]2. The result of
a Montgomery multiplication on M -residues will once
again be an M -residue, which is converted back to the
normal domain with MMM(Res,1).
The classical and the Montgomery modular multipli-
cation algorithms can be combined into one algorithm.
The main idea is to perform both algorithms in parallel,
as proposed in [21]. The author prefers this solution to
either of the methods alone, but the prototype architecture
in [21] is not very efficient as it is not a systolic array.
The detailed algorithms are described in [2].
III. ON SERIAL FPGA IMPLEMENTATIONS
A. Bit-serial version of the multiplier
Our circuit implements Algorithm 1; it includes two
parts, classical and Montgomery, each of which is a
systolic array. The parts look quite similar as their cells
are performing similar operations i.e. multiplication and
XOR. The difference is that they shift in opposite di-
rections and they start from the opposite parts of the
loop. While the classical multiplier starts the shift-and-
add process from the MSB of one of the operands, and
shifts the cumulative result left, the Montgomery based
multiplier starts at the LSB, and shifts the result right.
They process the operand a(x) from different sides and
they stop after exactly n/2 cycles. The classical part
has still to perform a shift over n/2 bits but this is
taken care of by the conversion of the M -residue of
the result. More precisely, the M -residue is of the form
a(x)b(x)r−1(x) mod f(x) where r(x) = xn/2 (for n
even). The multiplication is completed by calculating
MMM [Res(x), 1] = a(x)b(x)x−n/2 · 1 · xn/2 mod
f(x) = a(x)b(x) mod f(x).
A schematic of the multiplier is presented in Figure 1.
ai, bi and fi are the coefficients of a(x), b(x) and
f(x) respectively. The outputs c(x) become inputs to
the systolic arrays in the next clock cycle. Finally, the
result of the multiplication is obtained by XOR-ing the
outputs of both systolic arrays. We show details of the
bit-serial version; the extension to digit-serial version
is straightforward. The Montgomery part is presented
in [18] as a complete multiplier performing only MMM
in GF (2n).
im
cell 0cell 1
leftmost
cell cell
regular
cell
regular rightmost
cell
c
0fn i−1,n−1 n−1a n−1b n−1−ia b1 c i−1,0 n−1−ia b0
i,n−1 c i,1 c i,0
cell n−1cell n
n−1−if 1f fci−1,n−2c
cell 1
cell
regular
cell 0
rightmost
cell
i−1,1c
im
i−1,2c
cell
regular
n−1c
leftmost
cell
n b n−1a i
cell n−1cell n
f fi−1,n 1f ai 1b 0f 0ia b
ci,1c i,n−1c i,n
CLASSICAL
MONTGOMERY
c
i,n−1
RESULT
..... c
i,0
c
i,n
..... c
i,1
Fig. 1. Schematic of combined multiplier. Classical and Montgomery’s
part are calculating in parallel and the result is XOR-ed afterwards.
Algorithm 1 Combined Modular Mult. in GF (2n)
Input: polynomials a(x), b(x) and f(x),
Output: Res(x) = a(x) · b(x) · x−n2  mod f(x)
1: Res(x) = 0, ResC(x) = 0, ResM (x) = 0 ,
2: for i from 0 to n2 − 1 do
3: ResC(x)← ResCn−1·f(x)+ResC(x)·x+an−1−i·
b(x)
4: ResM (x)← ResM (x) + ai · b(x)
5: ResM (x)← ResM (x) + ResM0 · f(x)
6: ResM (x)← ResM (x) div x
7: Res(x) = ResC(x) + ResM (x)
8: end for
9: Return Res(x)
B. Results
The implementation results of the combined multiplier
on a Xilinx Virtex XCV800 FPGA are given in Table 1.
These results are obtained with Xilinx Foundation soft-
ware. As mentioned above, the extension to a digit-serial
multiplier is straightforward. The digit-serial version of
Montgomery’s algorithm is also presented in [6].
TABLE I
COMPARISON OF FPGA IMPLEMENTATION RESULTS OF COMBINED
MODULAR MULTIPLIER AND MONTGOMERY MULTIPLIER OVER
GF (2160)
MMM comb.
mult. [18] mult.
# clock cycles 160 80
Min. clock period (ns) 10.375 13.860
Total MMM latency (µs) 1.66 1.109
# of Slices 1 427 1 049
IV. SIDE-CHANNEL SECURITY
When proposing an efficient implementation of a cryp-
tographic algorithm one should also consider side-channel
security. Namely, an implementation of a cryptographic
algorithm can be subjected to side-channel attacks such
as power analysis attacks [12], [14]. These attacks present
a realistic threat for wireless applications and have been
demonstrated to be very effective against smart cards
without specific countermeasures. A power analysis attack
exploits the fact that the power consumption during a
cryptographic operation is related to the function being
performed and to the (possibly sensitive) data being pro-
cessed. The advantage of our multiplier is that it processes
2 bits (or 2 words) in parallel. This feature does not only
improve performance but helps in protecting against side-
channel attacks because of the parallel processing. To
support this claim we present two power consumption
graphs for 480-bit multiplications, which show the power
consumption for Montgomery’s algorithm (Figure 2) and
for the combined algorithm (Figure 3). These graphs are
obtained using the measurement setup discussed in [19].
The power consumption graph for the combined mul-
tiplier contains traces of two overlapping multiplications
and appears to include more noise. One can anticipate
that it will be harder to handle this with power analysis
attacks. The general idea is that when more calculations
are performed at the same time, the attacker has a more
difficult task in front of him. Strong resistance against
side-channel analysis attacks will of course require addi-
tional countermeasures.
V. CONCLUSIONS
This paper presents an FPGA implementation of a new
multiplier for the finite field GF (2n) using a polynomial
basis representation. Performance data are given, showing
a reduction in the number of cycles with a factor 2,
without increasing the gate complexity. We can also show
some improved resistance against power analysis.
2 3 4 5 6 7 8
x 10
5
−3
−2
−1
0
1
2
3
x 10
4
Fig. 2. Power consumption curve for multiplication via Montgomery’s
algorithm.
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
x 10
6
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
x 10
4
Fig. 3. Power consumption curve for multiplication via combined
algorithm. The peak at the end corresponds to the XOR of the two
parts.
ACKNOWLEDGMENT
Lejla Batina, Nele Mentens and Sıddıka Berna O¨rs
are funded by research grants of the Katholieke Uni-
versiteit Leuven, Belgium. This work was supported by
Concerted Research Action GOA-MEFISTO-666 of the
Flemish Government and by the FWO “Identification and
Cryptography” project (G.0141.03).
REFERENCES
[1] G. B. Agnew, R. C. Mullin, and S. A. Vanstone. An implementa-
tion of elliptic curve cryptosystem over F2155 . IEEE Journal on
Selected Areas in Communications, 11(5):804–813, June 1993.
[2] L. Batina, C. Jansen, G. Muurling, and S. Xu. Almost Montgomery
based multiplier in in GF(2n). In B. Macq and J.-J. Quisquater, ed-
itors, Proceedings of the 23rd Symposium on Information Theory in
the Benelux, pages 61–68, Louvain-la-Neuve, Belgium, May 29-31
2002. Werkgemeeschap voor Informatie-en-Communicatietheorie,
Enschede, The Netherlands.
[3] L. Batina, S. B. O¨rs, B. Preneel, and J. Vandewalle. Hardware
architectures for public key cryptography. Elsevier Science Inte-
gration the VLSI Journal, 34(1-2):1–64, 2003.
[4] T. Beth and D. Gollmann. Algorithm engineering for public key
algorithm. IEEE Journal on Selected Areas in Communications,
7(4):458–465, May 1989.
[5] I. Blake, G. Seroussi, and N. P. Smart. Elliptic Curves in
Cryptography. London Mathematical Society Lecture Note Series.
Cambridge University Press, 1999.
[6] C¸. K. Koc¸ and T. Acar. Montgomery multiplication in GF(2k).
Designs, Codes and Cryptography, 14:57–69, 1998.
[7] L. Gao, S. Shrivastava, and G. E. Sobelman. Elliptic curve scalar
multiplier design using FPGAs. In C¸. K. Koc¸ and C. Paar, editors,
Proceedings of the 1st International Workshop on Cryptographic
Hardware and Embedded Systems (CHES), number 1717 in Lec-
ture Notes in Computer Science, pages 257–305, Worcester, MA,
USA, August 1999. Springer-Verlag.
[8] J. Goodman and A. P. Chandrakasan. An energy-efficient reconfig-
urable public-key cryptography processor. IEEE Journal of Solid-
State Circuits, 36(11):1808–1820, November 2001.
[9] N. Gura, S. C. Shantz, H. Eberle, D. Finchelstein, S. Gupta,
V. Gupta, and D. Stebila. An end-to-end systems approach to
elliptic curve cryptography. In Burt Kaliski Jr., C¸. K. Koc¸, and
C. Paar, editors, Proceedings of 4th International Workshop on
Cryptographic Hardware and Embedded Systems (CHES), Lecture
Notes in Computer Science, San Francisco Bay (Redwood City),
USA, August 13-15 2002.
[10] N. Koblitz. Elliptic curve cryptosystem. Math. Comp., 48:203–
209, 1987.
[11] N. Koblitz, A. Menezes, and S. Vanstone. The state of elliptic
curve cryptography. Designs, Codes and Cryptography, 19:173–
193, 2000.
[12] P. Kocher, J. Jaffe, and B. Jun. Differential power analysis.
In M. Wiener, editor, Advances in Cryptology: Proceedings of
CRYPTO’99, number 1666 in Lecture Notes in Computer Science,
pages 388–397, Santa Barbara, CA, USA, August 15-19 1999.
Springer-Verlag.
[13] A. J. Menezes. Elliptic Curve Public Key Cryptosystems. Kluwer
Academic Publishers, 1993.
[14] T. S. Messerges, E. A. Dabbish, and R. H. Sloan. Examining smart-
card security under the threat of power analysis attacks. IEEE
Transactions on Computers, 51(5):541–552, May 2002.
[15] V. Miller. Uses of elliptic curves in cryptography. In
H. C. Williams, editor, Advances in Cryptology: Proceedings of
CRYPTO’85, number 218 in Lecture Notes in Computer Science,
pages 417–426. Springer-Verlag, 1985.
[16] P. Montgomery. Modular multiplication without trial division.
Mathematics of Computation, Vol. 44:519–521, 1985.
[17] G. Orlando and C. Paar. A high-performance reconfigurable
elliptic curve processor for GF(2m). In C¸. K. Koc¸ and C. Paar,
editors, Proceedings of 2nd International Workshop on Cryp-
tograpic Hardware and Embedded Systems (CHES), number 1965
in Lecture Notes in Computer Science, pages 41–56, Worcester,
Massachusetts, USA, August 17-18 2000. Springer-Verlag.
[18] S. B. O¨rs, N. Mentens, B. Preneel, and J. Vandewalle. An FPGA
implementation of a Montgomery multiplier over GF(2m). 2004.
to appear in the proceedings of GLSVLSI, Boston.
[19] S. B. O¨rs, E. Oswald, and B. Preneel. Power-analysis attacks on
an FPGA – first experimental results. In C. Walter, C¸. K. Koc¸,
and C. Paar, editors, Proceedings of 5th International Workshop
on Cryptographic Hardware and Embedded Systems (CHES),
number 2779 in Lecture Notes in Computer Science, pages 35–50,
Cologne, Germany, September 7-10 2003. Springer-Verlag.
[20] G. Theodoridos P. Kitsos and O. Koufopavlou. An efficient,
reconfigurable multiplier architecture for Galois field GF(2m).
Elsevier Science Microelectronics Journal, 34:975–980, 2003.
[21] M. J. Potgieter. A hardware implementation of the group opera-
tions necessary for implementing an elliptic curve cryptosystem
over a characteristic two finite field. Final report of project
EPR400, Technical University Eindhoven, 2002.
[22] S. Sutikno, R. Effendi, and A. Surya. Design and implementation
of arithmetic processor GF(2155) for elliptic curve cryptosystems.
In Proceedings of the 1998 IEEE Asia-Pacific Conference on
Circuits and Systems (APCCAS’98), pages 647–650, 1998.
[23] S.-W. Wei. VLSI architectures for computing exponentiations,
multiplicative inverses, and divisions in GF(2m). IEEE Trans-
actions on Circuits and Systems II: Analog and Digital Signal
Processing, 44(10):847–855, October 1997.
