10 research outputs found

    Exploring Parallelism to Improve the Performance of FrodoKEM in Hardware

    Get PDF
    FrodoKEM is a lattice-based key encapsulation mechanism, currently a semi-finalist in NIST’s post-quantum standardisation effort. A condition for these candidates is to use NIST standards for sources of randomness (i.e. seed-expanding), and as such most candidates utilise SHAKE, an XOF defined in the SHA-3 standard. However, for many of the candidates, this module is a significant implementation bottleneck. Trivium is a lightweight, ISO standard stream cipher which performs well in hardware and has been used in previous hardware designs for lattice-based cryptography. This research proposes optimised designs for FrodoKEM, concentrating on high throughput by parallelising the matrix multiplication operations within the cryptographic scheme. This process is eased by the use of Trivium due to its higher throughput and lower area consumption. The parallelisations proposed also complement the addition of first-order masking to the decapsulation module. Overall, we significantly increase the throughput of FrodoKEM; for encapsulation we see a 16 × speed-up, achieving 825 operations per second, and for decapsulation we see a 14 × speed-up, achieving 763 operations per second, compared to the previous state of the art, whilst also maintaining a similar FPGA area footprint of less than 2000 slices.</p

    Implementation and Benchmarking of Round 2 Candidates in the NIST Post-Quantum Cryptography Standardization Process Using Hardware and Software/Hardware Co-design Approaches

    Get PDF
    Performance in hardware has typically played a major role in differentiating among leading candidates in cryptographic standardization efforts. Winners of two past NIST cryptographic contests (Rijndael in case of AES and Keccak in case of SHA-3) were ranked consistently among the two fastest candidates when implemented using FPGAs and ASICs. Hardware implementations of cryptographic operations may quite easily outperform software implementations for at least a subset of major performance metrics, such as speed, power consumption, and energy usage, as well as in terms of security against physical attacks, including side-channel analysis. Using hardware also permits much higher flexibility in trading one subset of these properties for another. A large number of candidates at the early stages of the standardization process makes the accurate and fair comparison very challenging. Nevertheless, in all major past cryptographic standardization efforts, future winners were identified quite early in the evaluation process and held their lead until the standard was selected. Additionally, identifying some candidates as either inherently slow or costly in hardware helped to eliminate a subset of candidates, saving countless hours of cryptanalysis. Finally, early implementations provided a baseline for future design space explorations, paving a way to more comprehensive and fairer benchmarking at the later stages of a given cryptographic competition. In this paper, we first summarize, compare, and analyze results reported by other groups until mid-May 2020, i.e., until the end of Round 2 of the NIST PQC process. We then outline our own methodology for implementing and benchmarking PQC candidates using both hardware and software/hardware co-design approaches. We apply our hardware approach to 6 lattice-based CCA-secure Key Encapsulation Mechanisms (KEMs), representing 4 NIST PQC submissions. We then apply a software-hardware co-design approach to 12 lattice-based CCA-secure KEMs, representing 8 Round 2 submissions. We hope that, combined with results reported by other groups, our study will provide NIST with helpful information regarding the relative performance of a significant subset of Round 2 PQC candidates, assuming that at least their major operations, and possibly the entire algorithms, are off-loaded to hardware

    SNEIK on Microcontrollers: AVR, ARMv7-M, and RISC-V with Custom Instructions

    Get PDF
    SNEIK is a family of lightweight cryptographic algorithms derived from a single 512-bit permutation. The SNEIGEN ``entropy distribution function\u27\u27 was designed to speed up certain functions in post-quantum and lattice-based public key algorithms. We implement and evaluate SNEIK algorithms on popular 8-bit AVR and 32-bit ARMv7-M (Cortex M3/M4) microcontrollers, and also describe an implementation for the open-source RISC-V (RV32I) Instruction Set Architecture (ISA). Our results demonstrate that SNEIK algorithms usually outperform AES and SHA-2/3 on these lightweight targets while having a naturally constant-time design and significantly smaller implementation footprint. The RISC-V architecture is becoming increasingly popular for custom embedded designs that integrate a CPU core with application-specific hardware components. We show that inclusion of two simple custom instructions into the RV32I ISA yields a radical (more than five-fold) speedup of the SNEIK permutation and derived algorithms on that target, allowing us to reach 12.4 cycles/byte SNEIKEN-128 authenticated encryption performance on PQShield\u27s ``Crimson Puppy\u27\u27 RV32I-based SoC. Our performance measurements are for realistic message sizes and have been made using real hardware. We also offer implementation size metrics in terms of RAM, firmware size, and additional FPGA logic for the custom instruction set extensions

    SPQCop: Side-channel protected Post-Quantum Cryptoprocessor

    Get PDF
    The past few decades have seen significant progress in practically realizable quantum technologies. It is well known since the work of Peter Shor that large scale quantum computers will threaten the security of most of the currently used public key cryptographic algorithms. This has spurred the cryptography community to design algorithms which will remain safe even with the emergence of large scale quantum computing systems. An effort in this direction is the currently ongoing post-quantum cryptography (PQC) competition, which has led to the design and analysis of many concrete cryptographic constructions. Among these, Lattice based algorithms have emerged to be promising candidates. Therefore, we focus on the efficient implementation of Ring-LWE based quantum-safe key-exchange algorithms. Further, deployment of hardware implementing such algorithms in critical applications requires security against implementation attacks. In this work, we design a side channel resistant post-quantum cryptoprocessor which supports NewHope-NIST, NewHope-USENIX and HILA5 key-exchange schemes. The implemented cryptoprocessor is highly optimized with minimal overhead due to the countermeasures. It requires about 13,500 LUTs and 8,100 FFs. Due to a significantly pipelined architecture, an operating speed of 406 MHz could be achieved on the latest 16nm FPGAs; resulting in a key-exchange time of only 158uS, 157uS and 148uS for the above mentioned designs respectively. We also present detailed area and performance metrics for different modules required for all the designs. To the best of our knowledge, this work presents the first side-channel leakage resistant post quantum accelerator. Furthermore, this is also the fastest hardware implementation of NewHope-NIST

    SDitH in Hardware

    Get PDF
    This work presents the first hardware realisation of the Syndrome-Decodingin-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH’s hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction. This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for MPCitH constructions, after Picnic. This work presents optimised designs to achieve the best area efficiency, which we evaluate using the Time-Area Product (TAP) metric. This work also proposes a novel hardware architecture by dividing the signature generation algorithm into two phases, namely offline and online phases for optimising the overall clock cycle count. The hardware designs for key generation, signature generation, and signature verification are parameterised for all SDitH parameters, including the NIST security levels, both syndrome decoding base fields (GF256 and GF251), and thus conforms to the SDitH specifications. The hardware design further supports secret share splitting, and the hypercube optimisation which can be applied in this and multiple other NIST PQC candidates. The results of this work result in a hardware design with a drastic reducing in clock cycles compared to the optimised AVX2 software implementation, in the range of 2-4x for most operations. Our key generation outperforms software drastically, giving a 11-17x reduction in runtime, despite the significantly faster clock speed. On Artix 7 FPGAs we can perform key generation in 55.1 Kcycles, signature generation in 6.7 Mcycles, and signature verification in 8.6 Mcycles for NIST L1 parameters, which increase for GF251, and for L3 and L5 parameters

    SDitH in Hardware

    Get PDF
    This work presents the first hardware realisation of the Syndrome-Decoding-in-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH\u27s hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction. This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for MPCitH constructions, after Picnic. This work presents optimised designs to achieve the best area efficiency, which we evaluate using the Time-Area Product (TAP) metric. This work also proposes a novel hardware architecture by dividing the signature generation algorithm into two phases, namely offline and online phases for optimising the overall clock cycle count. The hardware designs for key generation, signature generation, and signature verification are parameterised for all SDitH parameters, including the NIST security levels, both syndrome decoding base fields (GF256 and GF251), and thus conforms to the SDitH specifications. The hardware design further supports secret share splitting, and the hypercube optimisation which can be applied in this and multiple other NIST PQC candidates. The results of this work result in a hardware design with a drastic reducing in clock cycles compared to the optimised AVX2 software implementation, in the range of 2-4x for most operations. Our key generation outperforms software drastically, giving a 11-17x reduction in runtime, despite the significantly faster clock speed. On Artix 7 FPGAs we can perform key generation in 55.1 Kcycles, signature generation in 6.7 Mcycles, and signature verification in 8.6 Mcycles for NIST L1 parameters, which increase for GF251, and for L3 and L5 parameters

    NewHope: A Mobile Implementation of a Post-Quantum Cryptographic Key Encapsulation Mechanism

    Get PDF
    NIST anticipates the appearance of large-scale quantum computers by 2036 [34], which will threaten widely used asymmetric algorithms, National Institute of Standards and Technology (NIST) launched a Post-Quantum Cryptography Standardization Project to find quantum-secure alternatives. NewHope post-quantum cryptography (PQC) key encapsulation mechanism (KEM) is the only Round 2 candidate to simultaneously achieve small key values through the use of a security problem with sufficient confidence its security, while mitigating any known vulnerabilities. This research contributes to NIST project’s overall goal by assessing the platform flexibility and resource requirements of NewHope KEMs on an Android mobile device. The resource requirements analyzed are transmission size as well as scheme runtime, central processing unit (CPU), memory, and energy usage. Results from each NewHope KEM instantiations are compared amongst each other, to a baseline application, and to results from previous work. NewHope PQC KEM was demonstrated to have sufficient flexibility for mobile implementation, competitive performance with other PQC KEMs, and to have competitive scheme runtime with current key exchange algorithms
    corecore