63 research outputs found

    Optimization of Supersingular Isogeny Cryptography for Deeply Embedded Systems

    Get PDF
    Public-key cryptography in use today can be broken by a quantum computer with sufficient resources. Microsoft Research has published an open-source library of quantum-secure supersingular isogeny (SI) algorithms including Diffie-Hellman key agreement and key encapsulation in portable C and optimized x86 and x64 implementations. For our research, we modified this library to target a deeply-embedded processor with instruction set extensions and a finite-field coprocessor originally designed to accelerate traditional elliptic curve cryptography (ECC). We observed a 6.3-7.5x improvement over a portable C implementation using instruction set extensions and a further 6.0-6.1x improvement with the addition of the coprocessor. Modification of the coprocessor to a wider datapath further increased performance 2.6-2.9x. Our results show that current traditional ECC implementations can be easily refactored to use supersingular elliptic curve arithmetic and achieve post-quantum security

    Optimizations of Isogeny-based Key Exchange

    Get PDF
    Supersingular Isogeny Diffie-Hellman (SIDH) is a key exchange scheme that is believed to be quantum-resistant. It is based on the difficulty of finding a certain isogeny between given elliptic curves. Over the last nine years, optimizations have been proposed that significantly increased the performance of its implementations. Today, SIDH is a promising candidate in the US National Institute for Standards and Technology’s (NIST’s) post-quantum cryptography standardization process. This work is a self-contained introduction to the active research on SIDH from a high-level, algorithmic lens. After an introduction to elliptic curves and SIDH itself, we describe the mathematical and algorithmic building blocks of the fastest known implementations. Regarding elliptic curves, we describe which algorithms, data structures and trade-offs regard- ing elliptic curve arithmetic and isogeny computations exist and quantify their runtime cost in field operations. These findings are then tailored to the situation of SIDH. As a result, we give efficient algorithms for the performance-critical parts of the protocol

    Ultra-Fast Modular Multiplication Implementation for Isogeny-Based Post-Quantum Cryptography

    Get PDF
    Supersingular isogeny key encapsulation (SIKE) protocol delivers promising public and secret key sizes over other post-quantum candidates. However, the huge computations form the bottleneck and limit its practical applications. The modular multiplication operation, which is one of the most computationally demanding operations in the fundamental arithmetics, takes up a large part of the computations in the protocol. In this paper, we propose an improved unconventional-radix finite-field multiplication (IFFM) algorithm which reduces the computational complexity by about 20% compared to previous algorithms. We then devise a new high-speed modular multiplier architecture based on the IFFM. It is shown that the proposed architecture can be extensively pipelined to achieve a very high clock speed due to its complete feedforward scheme, which demonstrates significant advantages over conventional designs. The FPGA implementation results show the proposed multiplier has about 67 times faster throughput than the state-of-the-art designs and more than 12 times better area efficiency than previous works. Therefore, we think that these achievements will greatly contribute to the practicability of this protocol

    Cryptography on Isogeny Graphs

    Get PDF

    Efficient Software Implementation of the SIKE Protocol Using a New Data Representation

    Get PDF
    Thanks to relatively small public and secret keys, the Supersingular Isogeny Key Encapsulation (SIKE) protocol made it into the third evaluation round of the post-quantum standardization project of the National Institute of Standards and Technology (NIST). Even though a large body of research has been devoted to the efficient implementation of SIKE, its latency is still undesirably long for many real-world applications. Most existing implementations of the SIKE protocol use the Montgomery representation for the underlying field arithmetic since the corresponding reduction algorithm is considered the fastest method for performing multiple-precision modular reduction. In this paper, we propose a new data representation for supersingular isogeny-based Elliptic-Curve Cryptography (ECC), of which SIKE is a sub-class. This new representation enables significantly faster implementations of modular reduction than the Montgomery reduction, and also other finite-field arithmetic operations used in ECC can benefit from our data representation. We implemented all arithmetic operations in C using the proposed representation such that they have constant execution time and integrated them to the latest version of the SIKE software library. Using four different parameters sets, we benchmarked our design and the optimized generic implementation on a 2.6 GHz Intel Xeon E5-2690 processor. Our results show that, for the prime of SIKEp751, the proposed reduction algorithm is approximately 2.61 times faster than the currently best implementation of Montgomery reduction, and our representation also enables significantly better timings for other finite-field operations. Due to these improvements, we were able to achieve a speed-up by a factor of about 1.65, 2.03, 1.61, and 1.48 for SIKEp751, SIKEp610, SIKEp503, and SIKEp434, respectively, compared to state-of-the-art generic implementations.IKEp751, SIKEp610, SIKEp503, and SIKEp434, respectively, compared to state-of-the-art generic implementations

    High-Speed Modular Multipliers for Isogeny-Based Post-Quantum Cryptography

    Get PDF
    As one of the post-quantum protocol candidates, the supersingular isogeny key encapsulation (SIKE) protocol delivers promising public and secret key sizes over other candidates. Nevertheless, the considerable computations form the bottleneck and limit its practical applications. The modular multiplication operations occupy a large proportion of the overall computations required by the SIKE protocol. The VLSI implementation of the high-speed modular multiplier remains a big challenge. In this paper, we propose three improved modular multiplication algorithms based on an unconventional radix for this protocol, all of which cost about 20% fewer computations than the prior art. Besides, a multi-precision scheme is also introduced for the proposed algorithms to improve the scalability in hardware implementation, resulting in three new algorithms. We then present very efficient high-speed modular multiplier architectures for the six algorithms. It is shown that these new architectures can be highly optimized and extensively pipelined to obtain high throughput thanks to the adopted overlapping processing scheme. The FPGA implementation results show the proposed multipliers without the multi-precision scheme all achieve about 60 times higher throughput than the state-of-the-art design (the FFM2 multiplier), and those with the multi-precision scheme all acquire almost 10 times higher throughput than this work. Meanwhile, each of the multi-precision based designs has almost the same resource consumptions as the FFM2 does

    Fast Strategies for the Implementation of SIKE Round 3 on ARM Cortex-M4

    Get PDF
    Abstract The Supersingular Isogeny Key Encapsulation mechanism (SIKE) is the only post-quantum key encapsulation mechanism based on supersingular elliptic curves and isogenies between them. Despite the security of the protocol, unlike the rest of the NIST post-quantum algorithms, SIKE requires more number of clock cycles and hence does not provide competitive timing, energy and power consumption results. However, it is more attractive offering smallest public key sizes as well as ciphertext sizes, which taking into account the impact of the communication costs and storage of the keys could become as good fit for resource-constrained devices. In this work, we present the fastest practical implementation of SIKE, targeting the platform Cortex-M4 based on the ARMv7-M architecture. We performed our measurements on NIST recommended device based on STM32F407 microcontroller, for benchmarking the clock cycles, and on the target board Nucleo-F411RE, attached to X-NUCLEO-LPM01A (Power Shield), for measuring the power and energy consumption. The lower level finite field arithmetic and extension field operations play main role determining the efficiency of SIKE. Therefore, we mainly focus on those improvements and apply them to all NIST required security levels. Our SIKEp434 implementations for NIST security level 1 take about 850ms which is about 22.3% faster than the counterparts appeared in previous work. Moreover, our implementations are 21.9%, 19.7% and 19.5% faster for SIKEp503, SIKEp610 and SIKEp751 in comparison to the previously reported work for other NIST recommended security levels. Finally, we benchmark power and energy consumption and report the results for comparison

    Hardware Architectures for Post-Quantum Cryptography

    Get PDF
    The rapid development of quantum computers poses severe threats to many commonly-used cryptographic algorithms that are embedded in different hardware devices to ensure the security and privacy of data and communication. Seeking for new solutions that are potentially resistant against attacks from quantum computers, a new research field called Post-Quantum Cryptography (PQC) has emerged, that is, cryptosystems deployed in classical computers conjectured to be secure against attacks utilizing large-scale quantum computers. In order to secure data during storage or communication, and many other applications in the future, this dissertation focuses on the design, implementation, and evaluation of efficient PQC schemes in hardware. Four PQC algorithms, each from a different family, are studied in this dissertation. The first hardware architecture presented in this dissertation is focused on the code-based scheme Classic McEliece. The research presented in this dissertation is the first that builds the hardware architecture for the Classic McEliece cryptosystem. This research successfully demonstrated that complex code-based PQC algorithm can be run efficiently on hardware. Furthermore, this dissertation shows that implementation of this scheme on hardware can be easily tuned to different configurations by implementing support for flexible choices of security parameters as well as configurable hardware performance parameters. The successful prototype of the Classic McEliece scheme on hardware increased confidence in this scheme, and helped Classic McEliece to get recognized as one of seven finalists in the third round of the NIST PQC standardization process. While Classic McEliece serves as a ready-to-use candidate for many high-end applications, PQC solutions are also needed for low-end embedded devices. Embedded devices play an important role in our daily life. Despite their typically constrained resources, these devices require strong security measures to protect them against cyber attacks. Towards securing this type of devices, the second research presented in this dissertation focuses on the hash-based digital signature scheme XMSS. This research is the first that explores and presents practical hardware based XMSS solution for low-end embedded devices. In the design of XMSS hardware, a heterogenous software-hardware co-design approach was adopted, which combined the flexibility of the soft core with the acceleration from the hard core. The practicability and efficiency of the XMSS software-hardware co-design is further demonstrated by providing a hardware prototype on an open-source RISC-V based System-on-a-Chip (SoC) platform. The third research direction covered in this dissertation focuses on lattice-based cryptography, which represents one of the most promising and popular alternatives to today\u27s widely adopted public key solutions. Prior research has presented hardware designs targeting the computing blocks that are necessary for the implementation of lattice-based systems. However, a recurrent issue in most existing designs is that these hardware designs are not fully scalable or parameterized, hence limited to specific cryptographic primitives and security parameter sets. The research presented in this dissertation is the first that develops hardware accelerators that are designed to be fully parameterized to support different lattice-based schemes and parameters. Further, these accelerators are utilized to realize the first software-harware co-design of provably-secure instances of qTESLA, which is a lattice-based digital signature scheme. This dissertation demonstrates that even demanding, provably-secure schemes can be realized efficiently with proper use of software-hardware co-design. The final research presented in this dissertation is focused on the isogeny-based scheme SIKE, which recently made it to the final round of the PQC standardization process. This research shows that hardware accelerators can be designed to offload compute-intensive elliptic curve and isogeny computations to hardware in a versatile fashion. These hardware accelerators are designed to be fully parameterized to support different security parameter sets of SIKE as well as flexible hardware configurations targeting different user applications. This research is the first that presents versatile hardware accelerators for SIKE that can be mapped efficiently to both FPGA and ASIC platforms. Based on these accelerators, an efficient software-hardwareco-design is constructed for speeding up SIKE. In the end, this dissertation demonstrates that, despite being embedded with expensive arithmetic, the isogeny-based SIKE scheme can be run efficiently by exploiting specialized hardware. These four research directions combined demonstrate the practicability of building efficient hardware architectures for complex PQC algorithms. The exploration of efficient PQC solutions for different hardware platforms will eventually help migrate high-end servers and low-end embedded devices towards the post-quantum era

    Supersingular Isogeny Key Encapsulation (SIKE) Round 2 on ARM Cortex-M4

    Get PDF
    We present the first practical software implementation of Supersingular Isogeny Key Encapsulation (SIKE) round 2, targeting NIST\u27s 1, 2, 3, and 5 security levels on 32-bit ARM Cortex-M4 microcontrollers. The proposed library introduces a new speed record of all SIKE Round 2 protocols with reasonable memory consumption on the low-end target platforms. We achieved this record by adopting several state-of-the-art engineering techniques as well as highly-optimized hand-crafted assembly implementation of finite field arithmetic. In particular, we carefully redesign the previous optimized implementations of finite field arithmetic on the 32-bit ARM Cortex-M4 platform and propose a set of novel techniques which are explicitly suitable for SIKE primes. The benchmark result on STM32F4 Discovery board equipped with 32-bit ARM Cortex-M4 microcontrollers shows that entire key encapsulation and decapsultation over SIKEp434 take about 184 million clock cycles (i.e. 1.09 seconds @168MHz). In contrast to the previous optimized implementation of the isogeny-based key exchange on low-end 32-bit ARM Cortex-M4, our performance evaluation shows feasibility of using SIKE mechanism on the target platform. In comparison to the most of the post-quantum candidates, SIKE requires an excessive number of arithmetic operations, resulting in significantly slower timings. However, its small key size makes this scheme as a promising candidate on low-end microcontrollers in the quantum era by ensuring the lower energy consumption for key transmission than other schemes

    18 Seconds to Key Exchange: Limitations of Supersingular Isogeny Diffie-Hellman on Embedded Devices

    Get PDF
    The quantum secure supersingular isogeny Diffie-Hellman (SIDH) key exchange is a promising candidate in NIST\u27s on-going post-quantum standardization process. The evaluation of various implementation characteristics is part of this standardization process, and includes the assessment of the applicability on constrained devices. When compared to other post-quantum algorithms, SIDH appears to be well-suited for the implementation on those constrained devices due to its small key sizes. On the other hand, SIDH is computationally complex, which presumably results in long computation times. Since there are no published results to test this assumption, we present speed-optimized implementations for two small microcontrollers and set a first benchmark that can be of relevance for the standardization process. We use state-of-the art field arithmetic algorithms and optimize them in assembly. However, an ephemeral key exchange still requires more than 18 seconds on a 32-bit Cortex-M4 and more than 11 minutes on a 16-bit MSP430. Those results show that even with an improvement by a factor of 4, SIDH is in-fact impractical for small embedded devices, regardless of further possible improvements in the implementation. On a positive note, we also analyzed the implementation security of SIDH and found that appropriate DPA countermeasures can be implemented with little overhead
    • …
    corecore