27 research outputs found

    Ultra-Fast Modular Multiplication Implementation for Isogeny-Based Post-Quantum Cryptography

    Get PDF
    Supersingular isogeny key encapsulation (SIKE) protocol delivers promising public and secret key sizes over other post-quantum candidates. However, the huge computations form the bottleneck and limit its practical applications. The modular multiplication operation, which is one of the most computationally demanding operations in the fundamental arithmetics, takes up a large part of the computations in the protocol. In this paper, we propose an improved unconventional-radix finite-field multiplication (IFFM) algorithm which reduces the computational complexity by about 20% compared to previous algorithms. We then devise a new high-speed modular multiplier architecture based on the IFFM. It is shown that the proposed architecture can be extensively pipelined to achieve a very high clock speed due to its complete feedforward scheme, which demonstrates significant advantages over conventional designs. The FPGA implementation results show the proposed multiplier has about 67 times faster throughput than the state-of-the-art designs and more than 12 times better area efficiency than previous works. Therefore, we think that these achievements will greatly contribute to the practicability of this protocol

    Optimization of Supersingular Isogeny Cryptography for Deeply Embedded Systems

    Get PDF
    Public-key cryptography in use today can be broken by a quantum computer with sufficient resources. Microsoft Research has published an open-source library of quantum-secure supersingular isogeny (SI) algorithms including Diffie-Hellman key agreement and key encapsulation in portable C and optimized x86 and x64 implementations. For our research, we modified this library to target a deeply-embedded processor with instruction set extensions and a finite-field coprocessor originally designed to accelerate traditional elliptic curve cryptography (ECC). We observed a 6.3-7.5x improvement over a portable C implementation using instruction set extensions and a further 6.0-6.1x improvement with the addition of the coprocessor. Modification of the coprocessor to a wider datapath further increased performance 2.6-2.9x. Our results show that current traditional ECC implementations can be easily refactored to use supersingular elliptic curve arithmetic and achieve post-quantum security

    Efficient Software Implementation of the SIKE Protocol Using a New Data Representation

    Get PDF
    Thanks to relatively small public and secret keys, the Supersingular Isogeny Key Encapsulation (SIKE) protocol made it into the third evaluation round of the post-quantum standardization project of the National Institute of Standards and Technology (NIST). Even though a large body of research has been devoted to the efficient implementation of SIKE, its latency is still undesirably long for many real-world applications. Most existing implementations of the SIKE protocol use the Montgomery representation for the underlying field arithmetic since the corresponding reduction algorithm is considered the fastest method for performing multiple-precision modular reduction. In this paper, we propose a new data representation for supersingular isogeny-based Elliptic-Curve Cryptography (ECC), of which SIKE is a sub-class. This new representation enables significantly faster implementations of modular reduction than the Montgomery reduction, and also other finite-field arithmetic operations used in ECC can benefit from our data representation. We implemented all arithmetic operations in C using the proposed representation such that they have constant execution time and integrated them to the latest version of the SIKE software library. Using four different parameters sets, we benchmarked our design and the optimized generic implementation on a 2.6 GHz Intel Xeon E5-2690 processor. Our results show that, for the prime of SIKEp751, the proposed reduction algorithm is approximately 2.61 times faster than the currently best implementation of Montgomery reduction, and our representation also enables significantly better timings for other finite-field operations. Due to these improvements, we were able to achieve a speed-up by a factor of about 1.65, 2.03, 1.61, and 1.48 for SIKEp751, SIKEp610, SIKEp503, and SIKEp434, respectively, compared to state-of-the-art generic implementations.IKEp751, SIKEp610, SIKEp503, and SIKEp434, respectively, compared to state-of-the-art generic implementations

    High-Speed Modular Multipliers for Isogeny-Based Post-Quantum Cryptography

    Get PDF
    As one of the post-quantum protocol candidates, the supersingular isogeny key encapsulation (SIKE) protocol delivers promising public and secret key sizes over other candidates. Nevertheless, the considerable computations form the bottleneck and limit its practical applications. The modular multiplication operations occupy a large proportion of the overall computations required by the SIKE protocol. The VLSI implementation of the high-speed modular multiplier remains a big challenge. In this paper, we propose three improved modular multiplication algorithms based on an unconventional radix for this protocol, all of which cost about 20% fewer computations than the prior art. Besides, a multi-precision scheme is also introduced for the proposed algorithms to improve the scalability in hardware implementation, resulting in three new algorithms. We then present very efficient high-speed modular multiplier architectures for the six algorithms. It is shown that these new architectures can be highly optimized and extensively pipelined to obtain high throughput thanks to the adopted overlapping processing scheme. The FPGA implementation results show the proposed multipliers without the multi-precision scheme all achieve about 60 times higher throughput than the state-of-the-art design (the FFM2 multiplier), and those with the multi-precision scheme all acquire almost 10 times higher throughput than this work. Meanwhile, each of the multi-precision based designs has almost the same resource consumptions as the FFM2 does

    Efficient and Side-Channel Resistant Implementations of Next-Generation Cryptography

    Get PDF
    The rapid development of emerging information technologies, such as quantum computing and the Internet of Things (IoT), will have or have already had a huge impact on the world. These technologies can not only improve industrial productivity but they could also bring more convenience to people’s daily lives. However, these techniques have “side effects” in the world of cryptography – they pose new difficulties and challenges from theory to practice. Specifically, when quantum computing capability (i.e., logical qubits) reaches a certain level, Shor’s algorithm will be able to break almost all public-key cryptosystems currently in use. On the other hand, a great number of devices deployed in IoT environments have very constrained computing and storage resources, so the current widely-used cryptographic algorithms may not run efficiently on those devices. A new generation of cryptography has thus emerged, including Post-Quantum Cryptography (PQC), which remains secure under both classical and quantum attacks, and LightWeight Cryptography (LWC), which is tailored for resource-constrained devices. Research on next-generation cryptography is of importance and utmost urgency, and the US National Institute of Standards and Technology in particular has initiated the standardization process for PQC and LWC in 2016 and in 2018 respectively. Since next-generation cryptography is in a premature state and has developed rapidly in recent years, its theoretical security and practical deployment are not very well explored and are in significant need of evaluation. This thesis aims to look into the engineering aspects of next-generation cryptography, i.e., the problems concerning implementation efficiency (e.g., execution time and memory consumption) and security (e.g., countermeasures against timing attacks and power side-channel attacks). In more detail, we first explore efficient software implementation approaches for lattice-based PQC on constrained devices. Then, we study how to speed up isogeny-based PQC on modern high-performance processors especially by using their powerful vector units. Moreover, we research how to design sophisticated yet low-area instruction set extensions to further accelerate software implementations of LWC and long-integer-arithmetic-based PQC. Finally, to address the threats from potential power side-channel attacks, we present a concept of using special leakage-aware instructions to eliminate overwriting leakage for masked software implementations (of next-generation cryptography)

    A Faster Software Implementation of the Supersingular Isogeny Diffie-Hellman Key Exchange Protocol

    Get PDF
    Since its introduction by Jao and De Feo in 2011, the supersingular isogeny Diffie-Hellman (SIDH) key exchange protocol has positioned itself as a promising candidate for post-quantum cryptography. One salient feature of the SIDH protocol is that it requires exceptionally short key sizes. However, the latency associated to SIDH is higher than the ones reported for other post-quantum cryptosystem proposals. Aiming to accelerate the SIDH runtime performance, we present in this work several algorithmic optimizations targeting both elliptic-curve and field arithmetic operations. We introduce in the context of the SIDH protocol a more efficient approach for calculating the elliptic curve operation P + [k]Q. Our strategy achieves a factor 1.4 speedup compared with the popular variable-three-point ladder algorithm regularly used in the SIDH shared secret phase. Moreover, profiting from pre-computation techniques our algorithm yields a factor 1.7 acceleration for the computation of this operation in the SIDH key generation phase. We also present an optimized evaluation of the point tripling formula, and discuss several algorithmic and implementation techniques that lead to faster field arithmetic computations. A software implementation of the above improvements on an Intel Skylake Core i7-6700 processor gives a factor 1.33 speedup against the state-of-the-art software implementation of the SIDH protocol reported by Costello-Longa-Naehrig in CRYPTO 2016

    High-Speed Hardware Architectures and FPGA Benchmarking of CRYSTALS-Kyber, NTRU, and Saber

    Get PDF
    Performance in hardware has typically played a significant role in differentiating among leading candidates in cryptographic standardization efforts. Winners of two past NIST cryptographic contests (Rijndael in case of AES and Keccak in case of SHA-3) were ranked consistently among the two fastest candidates when implemented using FPGAs and ASICs. Hardware implementations of cryptographic operations may quite easily outperform software implementations for at least a subset of major performance metrics, such as latency, number of operations per second, power consumption, and energy usage, as well as in terms of security against physical attacks, including side-channel analysis. Using hardware also permits much higher flexibility in trading one subset of these properties for another. This paper presents high-speed hardware architectures for four lattice-based CCA-secure Key Encapsulation Mechanisms (KEMs), representing three NIST PQC finalists: CRYSTALS-Kyber, NTRU (with two distinct variants, NTRU-HPS and NTRU-HRSS), and Saber. We rank these candidates among each other and compare them with all other Round 3 KEMs based on the data from the previously reported work

    Reliable Hardware Architectures for Cyrtographic Block Ciphers LED and HIGHT

    Get PDF
    Cryptographic architectures provide different security properties to sensitive usage models. However, unless reliability of architectures is guaranteed, such security properties can be undermined through natural or malicious faults. In this thesis, two underlying block ciphers which can be used in authenticated encryption algorithms are considered, i.e., LED and HIGHT block ciphers. The former is of the Advanced Encryption Standard (AES) type and has been considered areaefficient, while the latter constitutes a Feistel network structure and is suitable for low-complexity and low-power embedded security applications. In this thesis, we propose efficient error detection architectures including variants of recomputing with encoded operands and signature-based schemes to detect both transient and permanent faults. Authenticated encryption is applied in cryptography to provide confidentiality, integrity, and authenticity simultaneously to the message sent in a communication channel. In this thesis, we show that the proposed schemes are applicable to the case study of Simple Lightweight CFB (SILC) for providing authenticated encryption with associated data (AEAD). The error simulations are performed using Xilinx ISE tool and the results are benchmarked for the Xilinx FPGA family Virtex- 7 to assess the reliability capability and efficiency of the proposed architectures

    Lightweight Architectures for Reliable and Fault Detection Simon and Speck Cryptographic Algorithms on FPGA

    Get PDF
    The widespread use of sensitive and constrained applications necessitates lightweight (lowpower and low-area) algorithms developed for constrained nano-devices. However, nearly all of such algorithms are optimized for platform-based performance and may not be useful for diverse and flexible applications. The National Security Agency (NSA) has proposed two relatively-recent families of lightweight ciphers, i.e., Simon and Speck, designed as efficient ciphers on both hardware and software platforms. This paper proposes concurrent error detection schemes to provide reliable architectures for these two families of lightweight block ciphers. The research work on analyzing the reliability of these algorithms and providing fault diagnosis approaches has not been undertaken to date to the best of our knowledge. The main aim of the proposed reliable architectures is to provide high error coverage while maintaining acceptable area and power consumption overheads. To achieve this, we propose a variant of recomputing with encoded operands. These low-complexity schemes are suited for lowresource applications such as sensitive, constrained implantable and wearable medical devices. We perform fault simulations for the proposed architectures by developing a fault model framework. The architectures are simulated and analyzed on recent field-programmable grate array (FPGA) platforms, and it is shown that the proposed schemes provide high error coverage. The proposed low-complexity concurrent error detection schemes are a step forward towards more reliable architectures for Simon and Speck algorithms in lightweight, secure applications
    corecore