146 research outputs found

    Efficient Software Implementation of the SIKE Protocol Using a New Data Representation

    Get PDF
    Thanks to relatively small public and secret keys, the Supersingular Isogeny Key Encapsulation (SIKE) protocol made it into the third evaluation round of the post-quantum standardization project of the National Institute of Standards and Technology (NIST). Even though a large body of research has been devoted to the efficient implementation of SIKE, its latency is still undesirably long for many real-world applications. Most existing implementations of the SIKE protocol use the Montgomery representation for the underlying field arithmetic since the corresponding reduction algorithm is considered the fastest method for performing multiple-precision modular reduction. In this paper, we propose a new data representation for supersingular isogeny-based Elliptic-Curve Cryptography (ECC), of which SIKE is a sub-class. This new representation enables significantly faster implementations of modular reduction than the Montgomery reduction, and also other finite-field arithmetic operations used in ECC can benefit from our data representation. We implemented all arithmetic operations in C using the proposed representation such that they have constant execution time and integrated them to the latest version of the SIKE software library. Using four different parameters sets, we benchmarked our design and the optimized generic implementation on a 2.6 GHz Intel Xeon E5-2690 processor. Our results show that, for the prime of SIKEp751, the proposed reduction algorithm is approximately 2.61 times faster than the currently best implementation of Montgomery reduction, and our representation also enables significantly better timings for other finite-field operations. Due to these improvements, we were able to achieve a speed-up by a factor of about 1.65, 2.03, 1.61, and 1.48 for SIKEp751, SIKEp610, SIKEp503, and SIKEp434, respectively, compared to state-of-the-art generic implementations.IKEp751, SIKEp610, SIKEp503, and SIKEp434, respectively, compared to state-of-the-art generic implementations

    Optimization of Supersingular Isogeny Cryptography for Deeply Embedded Systems

    Get PDF
    Public-key cryptography in use today can be broken by a quantum computer with sufficient resources. Microsoft Research has published an open-source library of quantum-secure supersingular isogeny (SI) algorithms including Diffie-Hellman key agreement and key encapsulation in portable C and optimized x86 and x64 implementations. For our research, we modified this library to target a deeply-embedded processor with instruction set extensions and a finite-field coprocessor originally designed to accelerate traditional elliptic curve cryptography (ECC). We observed a 6.3-7.5x improvement over a portable C implementation using instruction set extensions and a further 6.0-6.1x improvement with the addition of the coprocessor. Modification of the coprocessor to a wider datapath further increased performance 2.6-2.9x. Our results show that current traditional ECC implementations can be easily refactored to use supersingular elliptic curve arithmetic and achieve post-quantum security

    High-Speed FPGA Implementation of SIKE Based on An Ultra-Low-Latency Modular Multiplier

    Get PDF
    The supersingular isogeny key encapsulation (SIKE) protocol, as one of the post-quantum protocol candidates, is widely regarded as the best alternative for curve-based cryptography. However, the long latency, caused by the serial large-degree isogeny computation which is dominated by modular multiplications, has made it less competitive than most popular post-quantum candidates. In this paper, we propose a high-speed and low-latency architecture for our recently presented optimized SIKE algorithm. Firstly, we design a new field arithmetic logic unit (FALU) with many algorithmic transformations and architectural optimizations. Especially, for the FALU, an extremely low-latency modular multiplier is devised based on a modified algorithm by fully parallelizing and highly optimizing the small-size multipliers and the reduction submodules. Secondly, we develop a compact control logic and update the instructions based on the benchmark provided in the newest SIKE library, fitting well with our design. Thirdly, an efficient memory access method is proposed by scheduling the input and output of the arithmetic logic unit (ALU) in two identical RAMs, which can significantly reduce the latency. Finally, we code the proposed architectures using the Verilog language and integrate them into the SIKE library. The implementation results on a Xilinx Virtex-7 FPGA show that for SIKEp751, our design only costs 9.3 ms with a frequency of 155.8 MHz, about 2x faster than the state-of-the-art, and achieves the best area efficiency among existing works. Particularly, the modular multiplier merely needs 16 clock cycles, reducing the delay by nearly one order of magnitude with a small factor of increase in hardware resource

    Compressed SIKE Round 3 on ARM Cortex-M4

    Get PDF
    In 2016, the National Institute of Standards and Technology (NIST) initiated a standardization process among the post-quantum secure algorithms. Forming part of the alternate group of candidates after Round 2 of the process is the Supersingular Isogeny Key Encapsulation (SIKE) mechanism which attracts with the smallest key sizes offering post-quantum security in scenarios of limited bandwidth and memory resources. Even further reduction of the exchanged information is offered by the compression mechanism, proposed by Azarderakhsh et al., which, however, introduces a significant time overhead and increases the memory requirements of the protocol, making it challenging to integrate it into an embedded system. In this paper, we propose the first compressed SIKE implementation for a resource-constrained device, where we targeted the NIST recommended platform STM32F407VG featuring ARM Cortex-M4 processor. We integrate the isogeny-based implementation strategies described previously in the literature into the compressed version of SIKE. Additionally, we propose a new assembly design for the finite field operations particular for the compressed SIKE, and observe a speedup of up to 16% and up to 25% compared to the last best-reported assembly implementations for p434, p503, and p610

    Hardware Architectures for Post-Quantum Cryptography

    Get PDF
    The rapid development of quantum computers poses severe threats to many commonly-used cryptographic algorithms that are embedded in different hardware devices to ensure the security and privacy of data and communication. Seeking for new solutions that are potentially resistant against attacks from quantum computers, a new research field called Post-Quantum Cryptography (PQC) has emerged, that is, cryptosystems deployed in classical computers conjectured to be secure against attacks utilizing large-scale quantum computers. In order to secure data during storage or communication, and many other applications in the future, this dissertation focuses on the design, implementation, and evaluation of efficient PQC schemes in hardware. Four PQC algorithms, each from a different family, are studied in this dissertation. The first hardware architecture presented in this dissertation is focused on the code-based scheme Classic McEliece. The research presented in this dissertation is the first that builds the hardware architecture for the Classic McEliece cryptosystem. This research successfully demonstrated that complex code-based PQC algorithm can be run efficiently on hardware. Furthermore, this dissertation shows that implementation of this scheme on hardware can be easily tuned to different configurations by implementing support for flexible choices of security parameters as well as configurable hardware performance parameters. The successful prototype of the Classic McEliece scheme on hardware increased confidence in this scheme, and helped Classic McEliece to get recognized as one of seven finalists in the third round of the NIST PQC standardization process. While Classic McEliece serves as a ready-to-use candidate for many high-end applications, PQC solutions are also needed for low-end embedded devices. Embedded devices play an important role in our daily life. Despite their typically constrained resources, these devices require strong security measures to protect them against cyber attacks. Towards securing this type of devices, the second research presented in this dissertation focuses on the hash-based digital signature scheme XMSS. This research is the first that explores and presents practical hardware based XMSS solution for low-end embedded devices. In the design of XMSS hardware, a heterogenous software-hardware co-design approach was adopted, which combined the flexibility of the soft core with the acceleration from the hard core. The practicability and efficiency of the XMSS software-hardware co-design is further demonstrated by providing a hardware prototype on an open-source RISC-V based System-on-a-Chip (SoC) platform. The third research direction covered in this dissertation focuses on lattice-based cryptography, which represents one of the most promising and popular alternatives to today\u27s widely adopted public key solutions. Prior research has presented hardware designs targeting the computing blocks that are necessary for the implementation of lattice-based systems. However, a recurrent issue in most existing designs is that these hardware designs are not fully scalable or parameterized, hence limited to specific cryptographic primitives and security parameter sets. The research presented in this dissertation is the first that develops hardware accelerators that are designed to be fully parameterized to support different lattice-based schemes and parameters. Further, these accelerators are utilized to realize the first software-harware co-design of provably-secure instances of qTESLA, which is a lattice-based digital signature scheme. This dissertation demonstrates that even demanding, provably-secure schemes can be realized efficiently with proper use of software-hardware co-design. The final research presented in this dissertation is focused on the isogeny-based scheme SIKE, which recently made it to the final round of the PQC standardization process. This research shows that hardware accelerators can be designed to offload compute-intensive elliptic curve and isogeny computations to hardware in a versatile fashion. These hardware accelerators are designed to be fully parameterized to support different security parameter sets of SIKE as well as flexible hardware configurations targeting different user applications. This research is the first that presents versatile hardware accelerators for SIKE that can be mapped efficiently to both FPGA and ASIC platforms. Based on these accelerators, an efficient software-hardwareco-design is constructed for speeding up SIKE. In the end, this dissertation demonstrates that, despite being embedded with expensive arithmetic, the isogeny-based SIKE scheme can be run efficiently by exploiting specialized hardware. These four research directions combined demonstrate the practicability of building efficient hardware architectures for complex PQC algorithms. The exploration of efficient PQC solutions for different hardware platforms will eventually help migrate high-end servers and low-end embedded devices towards the post-quantum era

    Highly Vectorized SIKE for AVX-512

    Get PDF
    It is generally accepted that a large-scale quantum computer would be capable to break any public-key cryptosystem used today, thereby posing a serious threat to the security of the Internet’s public-key infrastructure. The US National Institute of Standards and Technology (NIST) addresses this threat with an open process for the standardization of quantum-safe key establishment and signature schemes, which is now in the final phase of the evaluation of candidates. SIKE (an abbreviation of Supersingular Isogeny Key Encapsulation) is one of the alternate candidates under evaluation and distinguishes itself from other candidates due to relatively short key lengths and relatively high computing costs. In this paper, we analyze how the latest generation of Intel’s Advanced Vector Extensions (AVX), in particular AVX-512IFMA, can be used to minimize the latency (resp. maximize the throughput) of the SIKE key encapsulation mechanism when executed on Ice LakeCPUs based on the Sunny Cove microarchitecture. We present various techniques to parallelize and speed up the base/extension field arithmetic, point arithmetic, and isogeny computations performed by SIKE. All these parallel processing techniques are combined in AVXSIKE, a highly optimized implementation of SIKE using Intel AVX-512IFMA instructions. Our experiments indicate that AVXSIKE instantiated with the SIKEp503 parameter set is approximately 1.5 times faster than the to-date best AVX-512IFMA-based SIKE software from the literature. When executed on an Intel Core i3-1005G1 CPU, AVXSIKE outperforms the x64 assembly implementation of SIKE contained in Microsoft’s SIDHv3.4 library by a factor of about 2.5 for key generation and decapsulation, while the encapsulation is even 3.2 times faster

    Optimized Supersingular Isogeny Key Encapsulation on ARMv8 Processors

    Get PDF
    In this work, we present highly-optimized constant-time software libraries for Supersingular Isogeny Key Encapsulation (SIKE) protocol on ARMv8 processors. Our optimized hand-crafted assembly libraries provide the most efficient timing results on 64-bit ARM-powered devices. Moreover, the presented libraries can be integrated into any other cryptography primitives targeting the same finite field size. We design a new mixed implementation of field arithmetic on 64-bit ARM processors by exploiting the A64 and Advanced SIMD processing units working in parallel. Using these techniques, we are able to improve the performance of the entire protocol by the factor of 5 times compared to optimized C implementations on 64-bit ARM high-performance cores, providing 83-, 124-, and 159-bit quantum-security levels. Furthermore, we compare the performance of our proposed library with the previous highly-optimized ARMv8 assembly library available in the literature. The implementation results illustrate the overall 10% performance improvement in comparison with previous work, highlighting the benefit of using mixed implementation over relatively-large finite field size

    A Fast and Efficient SIKE Co-Design: Coarse-Grained Reconfigurable Accelerators with Custom RISC-V Microcontroller on FPGA

    Get PDF
    This paper proposes a fast and efficient FPGA-based hardware-software co-design for the supersingular isogeny key encapsulation (SIKE) protocol controlled by a custom RISC-V processor. Firstly, we highly optimize the core unit, the polynomial-based field arithmetic logic unit (FALU), with the proposed fast convolution-like multiplier (FCM) to significantly reduce the resource consumption while still maintaining low latency and constant time for all the four SIKE parameters. Secondly, we pack the small isogeny and point operations in hardware, devise a coarse-grained reconfigurable hardware architecture (CGRHA) based on FALU as the co-processor, and apply it to the RISC-V core with customized instructions, effectively avoiding extra time consumption for the data exchange with the software side and meanwhile increasing flexibility. Finally, we code the hardware in SystemVerilog language and the software in C language and run experiments on FPGAs. In the co-processor implementation, the experiment results show that our design for the four SIKE parameters achieves 2.6-4.4x speedup and obtains comparable or better area-time product to or than the state-of-the-art. In the hardware-software co-design experiments, we still have the superiority in speed and only <10\% of extra time is introduced by mutual communication

    Envisioning the Future of Cyber Security in Post-Quantum Era: A Survey on PQ Standardization, Applications, Challenges and Opportunities

    Full text link
    The rise of quantum computers exposes vulnerabilities in current public key cryptographic protocols, necessitating the development of secure post-quantum (PQ) schemes. Hence, we conduct a comprehensive study on various PQ approaches, covering the constructional design, structural vulnerabilities, and offer security assessments, implementation evaluations, and a particular focus on side-channel attacks. We analyze global standardization processes, evaluate their metrics in relation to real-world applications, and primarily focus on standardized PQ schemes, selected additional signature competition candidates, and PQ-secure cutting-edge schemes beyond standardization. Finally, we present visions and potential future directions for a seamless transition to the PQ era

    An Alternative Approach for SIDH Arithmetic

    Get PDF
    In this paper, we present new algorithms for the field arithmetic of supersingular isogeny Diffie-Hellman; one of the fifteen remaining candidates in the NIST post-quantum standardization process. Our approach uses a polynomial representation of the field elements together with mechanisms to keep the coefficients within bounds during the arithmetic operations. We present timings and comparisons for SIKEp503 and suggest a novel 736-bit prime that offers a 1.17×1.17\times speedup compared to SIKEp751 for a similar level of security
    • …
    corecore