60 research outputs found

    Hardware Architectures for Post-Quantum Cryptography

    Get PDF
    The rapid development of quantum computers poses severe threats to many commonly-used cryptographic algorithms that are embedded in different hardware devices to ensure the security and privacy of data and communication. Seeking for new solutions that are potentially resistant against attacks from quantum computers, a new research field called Post-Quantum Cryptography (PQC) has emerged, that is, cryptosystems deployed in classical computers conjectured to be secure against attacks utilizing large-scale quantum computers. In order to secure data during storage or communication, and many other applications in the future, this dissertation focuses on the design, implementation, and evaluation of efficient PQC schemes in hardware. Four PQC algorithms, each from a different family, are studied in this dissertation. The first hardware architecture presented in this dissertation is focused on the code-based scheme Classic McEliece. The research presented in this dissertation is the first that builds the hardware architecture for the Classic McEliece cryptosystem. This research successfully demonstrated that complex code-based PQC algorithm can be run efficiently on hardware. Furthermore, this dissertation shows that implementation of this scheme on hardware can be easily tuned to different configurations by implementing support for flexible choices of security parameters as well as configurable hardware performance parameters. The successful prototype of the Classic McEliece scheme on hardware increased confidence in this scheme, and helped Classic McEliece to get recognized as one of seven finalists in the third round of the NIST PQC standardization process. While Classic McEliece serves as a ready-to-use candidate for many high-end applications, PQC solutions are also needed for low-end embedded devices. Embedded devices play an important role in our daily life. Despite their typically constrained resources, these devices require strong security measures to protect them against cyber attacks. Towards securing this type of devices, the second research presented in this dissertation focuses on the hash-based digital signature scheme XMSS. This research is the first that explores and presents practical hardware based XMSS solution for low-end embedded devices. In the design of XMSS hardware, a heterogenous software-hardware co-design approach was adopted, which combined the flexibility of the soft core with the acceleration from the hard core. The practicability and efficiency of the XMSS software-hardware co-design is further demonstrated by providing a hardware prototype on an open-source RISC-V based System-on-a-Chip (SoC) platform. The third research direction covered in this dissertation focuses on lattice-based cryptography, which represents one of the most promising and popular alternatives to today\u27s widely adopted public key solutions. Prior research has presented hardware designs targeting the computing blocks that are necessary for the implementation of lattice-based systems. However, a recurrent issue in most existing designs is that these hardware designs are not fully scalable or parameterized, hence limited to specific cryptographic primitives and security parameter sets. The research presented in this dissertation is the first that develops hardware accelerators that are designed to be fully parameterized to support different lattice-based schemes and parameters. Further, these accelerators are utilized to realize the first software-harware co-design of provably-secure instances of qTESLA, which is a lattice-based digital signature scheme. This dissertation demonstrates that even demanding, provably-secure schemes can be realized efficiently with proper use of software-hardware co-design. The final research presented in this dissertation is focused on the isogeny-based scheme SIKE, which recently made it to the final round of the PQC standardization process. This research shows that hardware accelerators can be designed to offload compute-intensive elliptic curve and isogeny computations to hardware in a versatile fashion. These hardware accelerators are designed to be fully parameterized to support different security parameter sets of SIKE as well as flexible hardware configurations targeting different user applications. This research is the first that presents versatile hardware accelerators for SIKE that can be mapped efficiently to both FPGA and ASIC platforms. Based on these accelerators, an efficient software-hardwareco-design is constructed for speeding up SIKE. In the end, this dissertation demonstrates that, despite being embedded with expensive arithmetic, the isogeny-based SIKE scheme can be run efficiently by exploiting specialized hardware. These four research directions combined demonstrate the practicability of building efficient hardware architectures for complex PQC algorithms. The exploration of efficient PQC solutions for different hardware platforms will eventually help migrate high-end servers and low-end embedded devices towards the post-quantum era

    SIKE Round 2 Speed Record on ARM Cortex-M4

    Get PDF
    We present the first practical software implementation of Supersingular Isogeny Key Encapsulation (SIKE) round 2, targeting NISTā€™s 1, 2, and 5 security levels on 32-bit ARM Cortex-M4 microcontrollers. The proposed library introduces a new speed record of SIKE protocol on the target platform. We achieved this record by adopting several state-of-the-art engineering techniques as well as highly-optimized hand-crafted assembly implementation of finite field arithmetic. In particular, we carefully redesign the previous optimized implementations of filed arithmetic on 32-bit ARM Cortex-M4 platform and propose a set of novel techniques which are explicitly suitable for SIKE/SIDH primes. Moreover, the proposed arithmetic implementations are fully scalable to larger bit-length integers and can be adopted over different security levels. The benchmark result on STM32F4 Discovery board equipped with 32-bit ARM Cortex-M4 microcontrollers shows that the entire key encapsulation over p434 takes about 326 million clock cycles (i.e. 1.94 seconds @168MHz). In contrast to the previous optimized implementation of the isogeny-based key exchange on low-power 32-bit ARM Cortex-M4, our performance evaluation shows feasibility of using SIKE mechanism on the target platform. In comparison to the most of the post-quantum candidates, SIKE requires an excessive number of arithmetic operations, resulting in significantly slower timings. However, its small key size makes this scheme as a promising candidate on low-end microcontrollers in the quantum era by ensuring the lower energy consumption for key transmission than other schemes

    Optimization of Supersingular Isogeny Cryptography for Deeply Embedded Systems

    Get PDF
    Public-key cryptography in use today can be broken by a quantum computer with sufficient resources. Microsoft Research has published an open-source library of quantum-secure supersingular isogeny (SI) algorithms including Diffie-Hellman key agreement and key encapsulation in portable C and optimized x86 and x64 implementations. For our research, we modified this library to target a deeply-embedded processor with instruction set extensions and a finite-field coprocessor originally designed to accelerate traditional elliptic curve cryptography (ECC). We observed a 6.3-7.5x improvement over a portable C implementation using instruction set extensions and a further 6.0-6.1x improvement with the addition of the coprocessor. Modification of the coprocessor to a wider datapath further increased performance 2.6-2.9x. Our results show that current traditional ECC implementations can be easily refactored to use supersingular elliptic curve arithmetic and achieve post-quantum security

    Supersingular Isogeny Key Encapsulation (SIKE) Round 2 on ARM Cortex-M4

    Get PDF
    We present the first practical software implementation of Supersingular Isogeny Key Encapsulation (SIKE) round 2, targeting NIST\u27s 1, 2, 3, and 5 security levels on 32-bit ARM Cortex-M4 microcontrollers. The proposed library introduces a new speed record of all SIKE Round 2 protocols with reasonable memory consumption on the low-end target platforms. We achieved this record by adopting several state-of-the-art engineering techniques as well as highly-optimized hand-crafted assembly implementation of finite field arithmetic. In particular, we carefully redesign the previous optimized implementations of finite field arithmetic on the 32-bit ARM Cortex-M4 platform and propose a set of novel techniques which are explicitly suitable for SIKE primes. The benchmark result on STM32F4 Discovery board equipped with 32-bit ARM Cortex-M4 microcontrollers shows that entire key encapsulation and decapsultation over SIKEp434 take about 184 million clock cycles (i.e. 1.09 seconds @168MHz). In contrast to the previous optimized implementation of the isogeny-based key exchange on low-end 32-bit ARM Cortex-M4, our performance evaluation shows feasibility of using SIKE mechanism on the target platform. In comparison to the most of the post-quantum candidates, SIKE requires an excessive number of arithmetic operations, resulting in significantly slower timings. However, its small key size makes this scheme as a promising candidate on low-end microcontrollers in the quantum era by ensuring the lower energy consumption for key transmission than other schemes

    Optimized Supersingular Isogeny Key Encapsulation on ARMv8 Processors

    Get PDF
    In this work, we present highly-optimized constant-time software libraries for Supersingular Isogeny Key Encapsulation (SIKE) protocol on ARMv8 processors. Our optimized hand-crafted assembly libraries provide the most efficient timing results on 64-bit ARM-powered devices. Moreover, the presented libraries can be integrated into any other cryptography primitives targeting the same finite field size. We design a new mixed implementation of field arithmetic on 64-bit ARM processors by exploiting the A64 and Advanced SIMD processing units working in parallel. Using these techniques, we are able to improve the performance of the entire protocol by the factor of 5 times compared to optimized C implementations on 64-bit ARM high-performance cores, providing 83-, 124-, and 159-bit quantum-security levels. Furthermore, we compare the performance of our proposed library with the previous highly-optimized ARMv8 assembly library available in the literature. The implementation results illustrate the overall 10% performance improvement in comparison with previous work, highlighting the benefit of using mixed implementation over relatively-large finite field size

    Faster Isogenies for Quantum-Safe SIKE

    Get PDF
    In the third round of the NIST PQC standardization process, the only isogeny-based candidate, SIKE, suffers from slow performance when compared to other contenders. The large-degree isogeny computation performs a series of isogenous mappings between curves, to account for about 80% of SIKEā€™s latency. Here, we propose, implement, and evaluate a new method for computing large-degree isogenies of an odd power. Our new strategy for this computation avoids expensive recomputation of temporary isogeny results.We modified open-source libraries targeting x86, ARM64, and ARM32 platforms. Across each of these implementations, our new method achieves 10% and 5% speedups in SIKEā€™s key encapsulation and decapsulation operations, respectively. Additionally, these implementations use 3% less stack space at only a 48 byte increase in code size. Given the benefit and simplicity of our approach, we recommend this method for current and emerging SIKE implementations

    Efficient and Side-Channel Resistant Implementations of Next-Generation Cryptography

    Get PDF
    The rapid development of emerging information technologies, such as quantum computing and the Internet of Things (IoT), will have or have already had a huge impact on the world. These technologies can not only improve industrial productivity but they could also bring more convenience to peopleā€™s daily lives. However, these techniques have ā€œside effectsā€ in the world of cryptography ā€“ they pose new difficulties and challenges from theory to practice. Specifically, when quantum computing capability (i.e., logical qubits) reaches a certain level, Shorā€™s algorithm will be able to break almost all public-key cryptosystems currently in use. On the other hand, a great number of devices deployed in IoT environments have very constrained computing and storage resources, so the current widely-used cryptographic algorithms may not run efficiently on those devices. A new generation of cryptography has thus emerged, including Post-Quantum Cryptography (PQC), which remains secure under both classical and quantum attacks, and LightWeight Cryptography (LWC), which is tailored for resource-constrained devices. Research on next-generation cryptography is of importance and utmost urgency, and the US National Institute of Standards and Technology in particular has initiated the standardization process for PQC and LWC in 2016 and in 2018 respectively. Since next-generation cryptography is in a premature state and has developed rapidly in recent years, its theoretical security and practical deployment are not very well explored and are in significant need of evaluation. This thesis aims to look into the engineering aspects of next-generation cryptography, i.e., the problems concerning implementation efficiency (e.g., execution time and memory consumption) and security (e.g., countermeasures against timing attacks and power side-channel attacks). In more detail, we first explore efficient software implementation approaches for lattice-based PQC on constrained devices. Then, we study how to speed up isogeny-based PQC on modern high-performance processors especially by using their powerful vector units. Moreover, we research how to design sophisticated yet low-area instruction set extensions to further accelerate software implementations of LWC and long-integer-arithmetic-based PQC. Finally, to address the threats from potential power side-channel attacks, we present a concept of using special leakage-aware instructions to eliminate overwriting leakage for masked software implementations (of next-generation cryptography)

    Curve448 on 32-bit ARM Cortex-M4

    Get PDF
    Public key cryptography is widely used in key exchange and digital signature protocols. Public key cryptography requires expensive primitive operations, such as ļ¬nite-ļ¬eld and group operations. These ļ¬nite-ļ¬eld and group operations require a number of clock cycles to exe- cute. By carefully optimizing these primitive operations, public key cryp- tography can be performed with reasonably fast execution timing. In this paper, we present the new implementation result of Curve448 on 32-bit ARM Cortex-M4 microcontrollers. We adopted state-of-art implementa- tion methods, and some previous methods were re-designed to fully uti- lize the features of the target microcontrollers. The implementation was also performed with constant timing by utilizing the features of micro- controllers and algorithms. Finally, the scalar multiplication of Curve448 on 32-bit ARM Cortex-M4@168MHz microcontrollers requires 6,285,904 clock cycles. To the best of our knowledge, this is the ļ¬rst optimized im- plementation of Curve448 on 32-bit ARM Cortex-M4 microcontrollers. The result is also compared with other ECC and post-quantum cryptog- raphy (PQC) implementations. The proposed ECC and the-state-of-art PQC results show the practical usage of hybrid post-quantum TLS on the target processor

    Compressed SIKE Round 3 on ARM Cortex-M4

    Get PDF
    In 2016, the National Institute of Standards and Technology (NIST) initiated a standardization process among the post-quantum secure algorithms. Forming part of the alternate group of candidates after Round 2 of the process is the Supersingular Isogeny Key Encapsulation (SIKE) mechanism which attracts with the smallest key sizes offering post-quantum security in scenarios of limited bandwidth and memory resources. Even further reduction of the exchanged information is offered by the compression mechanism, proposed by Azarderakhsh et al., which, however, introduces a significant time overhead and increases the memory requirements of the protocol, making it challenging to integrate it into an embedded system. In this paper, we propose the first compressed SIKE implementation for a resource-constrained device, where we targeted the NIST recommended platform STM32F407VG featuring ARM Cortex-M4 processor. We integrate the isogeny-based implementation strategies described previously in the literature into the compressed version of SIKE. Additionally, we propose a new assembly design for the finite field operations particular for the compressed SIKE, and observe a speedup of up to 16% and up to 25% compared to the last best-reported assembly implementations for p434, p503, and p610
    • ā€¦
    corecore