93 research outputs found
Accelerating the Scalar Multiplication on Elliptic Curve Cryptosystems over Prime Fields
Elliptic curve cryptography (ECC), independently introduced by Koblitz and Miller in the 80\u27s, has attracted increasing attention in recent years due to its shorter key length requirement in comparison with other public-key cryptosystems such as RSA. Shorter key length means reduced power consumption and computing effort, and less storage requirement, factors that are fundamental in ubiquitous portable devices such as PDAs, cellphones, smartcards, and many others. To that end, a lot of research has been carried out to speed-up and improve ECC implementations, mainly focusing on the most important and time-consuming ECC operation: scalar multiplication.
In this thesis, we focus in optimizing such ECC operation at the point and scalar arithmetic levels, specifically targeting standard curves over prime fields. At the point arithmetic level, we introduce two innovative methodologies to accelerate ECC formulae: the use of new composite operations, which are built on top of basic point doubling and addition operations; and the substitution of field multiplications by squarings and other cheaper operations. These techniques are efficiently exploited, individually or jointly, in several contexts: to accelerate computation of scalar multiplications, and the computation of pre-computed points for window-based scalar multiplications (up to 30% improvement in comparison with previous best method); to speed-up computations of simple side-channel attack (SSCA)-protected implementations using innovative atomic structures (up to 22% improvement in comparison with scalar multiplication using original atomic structures); and to develop parallel formulae for SIMD-based applications, which are able to execute three and four operations simultaneously (up to 72% of improvement in comparison with a sequential scalar multiplication).
At the scalar arithmetic level, we develop new sublinear (in terms of Hamming weight) multibase scalar multiplications based on NAF-like conversion algorithms that are shown to be faster than any previous scalar multiplication method. For instance, proposed multibase scalar multiplications reduce computing times in 10.9% and 25.3% in comparison with traditional NAF for unprotected and SSCA-protected scenarios, respectively. Moreover, our conversion algorithms overcome the problem of converting any integer to multibase representation, solving an open problem that was defined as hard. Thus, our algorithms make the use of multiple bases practical for applications as ECC scalar multiplication for first time
Efficient Algorithms for Large Prime Characteristic Fields and Their Application to Bilinear Pairings
We propose a novel approach that generalizes interleaved modular multiplication algorithms to the computation of sums of products over large prime fields. This operation has widespread use and is at the core of many cryptographic applications. The method reformulates the widely used lazy reduction technique, crucially avoiding the need for storage and computation of double-precision operations. Moreover, it can be easily adapted to the different methods that exist to compute modular multiplication, producing algorithms that are significantly more efficient and memory-friendly.
We showcase the performance of the proposed approach in the computation of multiplication over an extension field , and demonstrate its impact with a record-breaking implementation for bilinear pairings: a full optimal ate pairing over the popular BLS12-381 curve is computed in under half a millisecond on a 3.2GHz Intel Coffee Lake processor, which is about faster than the state-of-the-art
FourQNEON: Faster Elliptic Curve Scalar Multiplications on ARM Processors
We present a high-speed, high-security implementation of the recently proposed elliptic curve FourQ (ASIACRYPT 2015) for 32-bit ARM processors with NEON support. Exploiting the versatile and compact arithmetic of this curve, we design a vectorized implementation that achieves high-performance across a large variety of ARM platforms. Our software is fully protected against timing and cache attacks, and showcases the impressive speed of FourQ when compared with other curve-based alternatives. For example, one single variable-base scalar multiplication is computed in about 235,000 Cortex-A8 cycles or 132,000 Cortex-A15 cycles which, compared to the results of the fastest genus 2 Kummer and Curve25519 implementations on the same platforms, offer speedups between 1.3x-1.7x and between 2.1x-2.4x, respectively. In comparison with the NIST standard curve K-283, we achieve speedups above 4x and 5.5x
A Note on Post-Quantum Authenticated Key Exchange from Supersingular Isogenies
In this work, we study several post-quantum authenticated key exchange protocols in the setting of supersingular isogenies. Leveraging the design of the well-studied schemes by Krawczyk (2003), Boyd et al. (2008), Fujioka et al. (2013), Krawczyk and Wee (2015), and others, we show how to use the Supersingular Isogeny Diffie-Hellman (SIDH) and Supersingular Isogeny Key Encapsulation (SIKE) protocols as basic building blocks to construct efficient and flexible authenticated key exchange schemes featuring different functionalities and levels of security.
This note is also intended to be a ``gentle\u27\u27 introduction to supersingular isogeny based cryptography, and its most relevant constructions, for protocol designers and cryptographers
High-Speed Elliptic Curve and Pairing-Based Cryptography
Elliptic Curve Cryptography (ECC), independently proposed by Miller [Mil86] and Koblitz [Kob87] in mid 80’s, is finding momentum to consolidate its status as the public-key system of choice in a wide range of applications and to further expand this position to settings traditionally occupied by RSA and DL-based systems. The non-existence of known subexponential attacks on this cryptosystem directly translates to shorter keylengths for a given security level and, consequently, has led to implementations with better bandwidth usage, reduced power and memory requirements, and higher speeds. Moreover, the dramatic entry of pairing-based cryptosystems defined on elliptic curves at the beginning of the new millennium has opened the possibility of a plethora of innovative applications, solving in some cases longstanding problems in cryptography. Nevertheless, public-key cryptography (PKC) is still relatively expensive in comparison with its symmetric-key counterpart and it remains an open challenge to reduce further the computing cost of the most time-consuming PKC primitives to guarantee their adoption for secure communication in commercial and Internet-based applications. The latter is especially true for pairing computations. Thus, it is of paramount importance to research methods which permit the efficient realization of Elliptic Curve and Pairing-based Cryptography on the several new platforms and applications.
This thesis deals with efficient methods and explicit formulas for computing elliptic curve scalar multiplication and pairings over fields of large prime characteristic with the objective of enabling the realization of software implementations at very high speeds.
To achieve this main goal in the case of elliptic curves, we accomplish the following tasks: identify the elliptic curve settings with the fastest arithmetic; accelerate the precomputation stage in the scalar multiplication; study number representations and scalar multiplication algorithms for speeding up the evaluation stage; identify most efficient field arithmetic algorithms and optimize them; analyze the architecture of the targeted platforms for maximizing the performance of ECC operations; identify most efficient coordinate systems and optimize explicit formulas; and realize implementations on x86-64 processors with an optimal algorithmic selection among all studied cases.
In the case of pairings, the following tasks are accomplished: accelerate tower and curve arithmetic; identify most efficient tower and field arithmetic algorithms and optimize them; identify the curve setting with the fastest arithmetic and optimize it; identify state-of-the-art techniques for the Miller loop and final exponentiation; and realize an implementation on x86-64 processors with optimal algorithmic selection.
The most outstanding contributions that have been achieved with the methodologies above in this thesis can be summarized as follows:
• Two novel precomputation schemes are introduced and shown to achieve the lowest costs in the literature for different curve forms and scalar multiplication primitives. The detailed cost formulas of the schemes are derived for most relevant scenarios.
• A new methodology based on the operation cost per bit to devise highly optimized and compact multibase algorithms is proposed. Derived multibase chains using bases {2,3} and {2,3,5} are shown to achieve the lowest theoretical costs for scalar multiplication on certain curve forms and for scenarios with and without precomputations. In addition, the zero and nonzero density formulas of the original (width-w) multibase NAF method are derived by using Markov chains. The application of “fractional” windows to the multibase method is described together with the derivation of the corresponding density formulas.
• Incomplete reduction and branchless arithmetic techniques are optimally combined for devising high-performance field arithmetic. Efficient algorithms for “small” modular operations using suitably chosen pseudo-Mersenne primes are carefully analyzed and optimized for incomplete reduction.
• Data dependencies between contiguous field operations are discovered to be a source of performance degradation on x86-64 processors. Three techniques for reducing the number of potential pipeline stalls due to these dependencies are proposed: field arithmetic scheduling, merging of point operations and merging of field operations.
• Explicit formulas for two relevant cases, namely Weierstrass and Twisted Edwards curves over and , are carefully optimized employing incomplete reduction, minimal number of operations and reduced number of data dependencies between contiguous field operations.
• Best algorithms for the field, point and scalar arithmetic, studied or proposed in this thesis, are brought together to realize four high-speed implementations on x86-64 processors at the 128-bit security level. Presented results set new speed records for elliptic curve scalar multiplication and introduce up to 34% of cost reduction in comparison with the best previous results in the literature.
• A generalized lazy reduction technique that enables the elimination of up to 32% of modular reductions in the pairing computation is proposed. Further, a methodology that keeps intermediate results under Montgomery reduction boundaries maximizing operations without carry checks is introduced. Optimized formulas for the popular tower are explicitly stated and a detailed operation count that permits to determine the theoretical cost improvement attainable with the proposed method is carried out for the case of an optimal ate pairing on a Barreto-Naehrig (BN) curve at the 128-bit security level.
• Best algorithms for the different stages of the pairing computation, including the proposed techniques and optimizations, are brought together to realize a high-speed implementation at the 128-bit security level. Presented results on x86-64 processors set new speed records for pairings, introducing up to 34% of cost reduction in comparison with the best published result.
From a general viewpoint, the proposed methods and optimized formulas have a practical impact in the performance of cryptographic protocols based on elliptic curves and pairings in a wide range of applications. In particular, the introduced implementations represent a direct and significant improvement that may be exploited in performance-dominated applications such as high-demand Web servers in which millions of secure transactions need to be generated
Fast Multibase Methods and Other Several Optimizations for Elliptic Curve Scalar Multiplication
Recently, the new Multibase Non-Adjacent Form (mbNAF) method was introduced and shown to speed up the execution of the scalar multiplication with an efficient use of multiple bases to represent
the scalar. In this work, we first optimize the previous method using
fractional windows, and then introduce further improvements to achieve additional cost reductions. Moreover, we present new improvements in the point operation formulae. Specifically, we reduce further the cost of composite operations such as quintupling and septupling of a point, which are relevant for the speed up of multibase methods in general. Remarkably, our tests show that, in the case of standard elliptic curves, the refined mbNAF method can be as efficient as Window-w NAF using an optimal fractional window size. Thus, this is the first published method that does not require precomputations to achieve comparable efficiency to the standard window-based NAF method using precomputations. On other highly efficient curves as Jacobi quartics and Edwards curves, our
tests show that the refined mbNAF currently attains the highest performance for both scenarios using precomputations and those without precomputations
Four-Dimensional Gallant-Lambert-Vanstone Scalar Multiplication
The GLV method of Gallant, Lambert and Vanstone~(CRYPTO 2001) computes any multiple of a point of prime order lying on an elliptic curve with a low-degree endomorphism (called GLV curve) over as , with for some explicit constant . Recently, Galbraith, Lin and Scott (EUROCRYPT 2009) extended this method to all curves over which are twists of curves defined over .
We show in this work how to merge the two approaches in order to get, for twists of any GLV curve over , a four-dimensional decomposition together with fast endomorphisms over acting on the group generated by a point of prime order , resulting in a proven decomposition for any scalar given by , with . Remarkably, taking the best , we obtain , independently of the curve, ensuring in theory an almost constant relative speedup. In practice, our experiments reveal that the use of the merged GLV-GLS approach supports a scalar multiplication that runs up to 50\% faster than the original GLV method. We then improve this performance even further by exploiting the Twisted Edwards model and show that curves originally slower may become extremely efficient on this model. In addition, we analyze the performance of the method on a multicore setting and describe how to efficiently protect GLV-based scalar multiplication against several side-channel attacks. Our implementations improve the state-of-the-art performance of point multiplication for a variety of scenarios including side-channel protected and unprotected cases with sequential and multicore execution
Speeding up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography
The Number Theoretic Transform (NTT) provides efficient algorithms for cyclic and nega-cyclic convolutions, which have many applications in computer arithmetic, e.g., for multiplying large integers and large degree polynomials. It is commonly used in cryptographic schemes that are based on the hardness of the Ring Learning With Errors (R-LWE) problem to efficiently implement modular polynomial multiplication.
We present a new modular reduction technique that is tailored for the special moduli required by the NTT. Based on this reduction, we speed up the NTT and propose faster, multi-purpose algorithms. We present two implementations of these algorithms: a portable C implementation and a high-speed implementation using assembly with AVX2 instructions. To demonstrate the improved efficiency in an application example, we benchmarked the algorithms in the context of the R-LWE key exchange protocol that has recently been proposed by Alkim, Ducas, Pöppelmann and Schwabe. In this case, our C and assembly implementations compute the full key exchange 1.49 and 1.13 times faster, respectively. These results are achieved with full protection against timing attacks
New Multibase Non-Adjacent Form Scalar Multiplication and its Application to Elliptic Curve Cryptosystems (extended version)
In this paper we present a new method for scalar multiplication that uses a generic multibase representation to reduce the number of required operations. Further, a multibase NAF-like algorithm that efficiently converts numbers to such representation without impacting memory or speed performance is developed and showed to be sublinear in terms of the number of nonzero terms. Additional representation reductions are discussed with the introduction of window-based variants that use an extended set of precomputations. To realize the proposed multibase scalar multiplication with or without precomputations in the setting of Elliptic Curve Cryptosystems (ECC) over prime fields, we also present a methodology to derive fast composite operations such as tripling or quintupling of a point that require less memory than previous point formulae. Point operations are then protected against simple side-channel attacks using a highly efficient atomic structure. Extensive testing is carried out to show that our multibase scalar multiplication is the fastest method to date in the setting of ECC and exhibits a small footprint, which makes it ideal for implementation on constrained devices
Setting Speed Records with the (Fractional) Multibase Non-Adjacent Form Method for Efficient Elliptic Curve Scalar Multiplication
In this paper, we introduce the Fractional Window-w Multibase Non-Adjacent Form (Frac-wmbNAF) method to perform the scalar multiplication. This method generalizes the recently developed Window-w mbNAF (wmbNAF) method by allowing an unrestricted number of precomputed points. We then make a comprehensive analysis of the most recent and relevant methods existent in the literature for the ECC scalar multiplication, including the presented generalization and its original non-window version known as Multibase Non-Adjacent Form (mbNAF). Moreover, we present new improvements in the point operation formulae. Specifically, we reduce further the cost of composite operations such as doubling-addition, tripling, quintupling and septupling of a point, which are relevant for the speed up of methods using multiple bases. Following, we also analyze the precomputation stage in scalar multiplications and present efficient schemes for the different studied scenarios. Our analysis includes the standard elliptic curves using Jacobian coordinates, and also Edwards curves, which are gaining growing attention due to their high performance. We demonstrate with extensive tests that mbNAF is currently the most efficient method without precomputations not only for the standard curves but also for the faster Edwards form. Similarly, Frac-wmbNAF is shown to attain the highest performance among window-based methods for all the studied curve forms
- …
