Search CORE

5 research outputs found

Optimized Architectures for Elliptic Curve Cryptography over Curve448

Author: Mehran Mozaffari Kermani
Mojtaba Bisheh Niasar
Reza Azarderakhsh
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 26/10/2020
Field of study

Abstract. In this paper, we present different implementations of point multiplication over Curve448. Curve448 has recently been recommended by NIST to provide 224-bit security over elliptic curve cryptography. Although implementing high-security cryptosystems should be considered due to recent improvements in cryptanalysis, hardware implementation of Curve488 has been investigated in a few studies. Hence, in this study, we propose three variable-base-point FPGA-based Curve448 implementations, i.e., lightweight, area-time efficient, and high-performance architectures, which aim to be used for different applications. Synthesized on a Xilinx Zynq 7020 FPGA, our proposed high-performance design increases 12% throughput with executing 1,219 point multiplication per second and increases 40% efficiency in terms of required clock cycles\timesutilized area compared to the best previous work. Furthermore, the proposed lightweight architecture works in 250 MHz and saves 96% of resources with the same performance. Additionally, our area-time efficient design considers a trade-off between time and required resources, which shows a 48% efficiency improvement with 52% fewer resources. Finally, effective side-channel countermeasures are added to our proposed designs, which also outperform previous works

Cryptology ePrint Archive

High-Speed NTT-based Polynomial Multiplication Accelerator for CRYSTALS-Kyber Post-Quantum Cryptography

Author: Mehran Mozaffari-Kermani
Mojtaba Bisheh-Niasar
Reza Azarderakhsh
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 03/05/2021
Field of study

This paper demonstrates an architecture for accelerating the polynomial multiplication using number theoretic transform (NTT). Kyber is one of the finalists in the third round of the NIST post-quantum cryptography standardization process. Simultaneously, the performance of NTT execution is its main challenge, requiring large memory and complex memory access pattern. In this paper, an efficient NTT architecture is presented to improve the respective computation time. We propose several optimization strategies for efficiency improvement targeting different performance requirements for various applications. Our NTT architecture, including four butterfly cores, occupies only 798 LUTs and 715 FFs on a small Artix-7 FPGA, showing more than 44% improvement compared to the best previous work. We also implement a coprocessor architecture for Kyber KEM benefiting from our high-speed NTT core to accomplish three phases of the key exchange in 9, 12, and 19 \mus, respectively, operating at 200 MHz

Cryptology ePrint Archive

Compressed SIKE Round 3 on ARM Cortex-M4

Author: Mehran Mozaffari Kermani
Mila Anastasova
Mojtaba Bisheh-Niasar
Reza Azarderakhsh
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 20/11/2021
Field of study

In 2016, the National Institute of Standards and Technology (NIST) initiated a standardization process among the post-quantum secure algorithms. Forming part of the alternate group of candidates after Round 2 of the process is the Supersingular Isogeny Key Encapsulation (SIKE) mechanism which attracts with the smallest key sizes offering post-quantum security in scenarios of limited bandwidth and memory resources. Even further reduction of the exchanged information is offered by the compression mechanism, proposed by Azarderakhsh et al., which, however, introduces a significant time overhead and increases the memory requirements of the protocol, making it challenging to integrate it into an embedded system. In this paper, we propose the first compressed SIKE implementation for a resource-constrained device, where we targeted the NIST recommended platform STM32F407VG featuring ARM Cortex-M4 processor. We integrate the isogeny-based implementation strategies described previously in the literature into the compressed version of SIKE. Additionally, we propose a new assembly design for the finite field operations particular for the compressed SIKE, and observe a speedup of up to 16% and up to 25% compared to the last best-reported assembly implementations for p434, p503, and p610

Cryptology ePrint Archive

Fast, Small, and Area-Time Efficient Architectures for Key-Exchange on Curve25519

Author: Mehran Mozaffari-Kermani
Mojtaba Bisheh Niasar
Rami El Khatib
Reza Azarderakhsh
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 27/06/2020
Field of study

Abstract--- This paper demonstrates fast and compact implementations of Elliptic Curve Cryptography (ECC) for efficient key agreement over Curve25519. Curve25519 has been recently adopted as a key exchange method for several applications such as connected small devices as well as cloud, and included in the National Institute of Standards and Technology (NIST) recommendations for public key cryptography. This paper presents three different performance level designs including lightweight, area-time efficient, and high-performance architectures. Lightweight hardware implementations are used for several Internet of Things (IoT) applications due to their resources being at premium. Our lightweight architecture utilizes 90% less resources compared to the best previous work while it is still more optimized in term of A\cdot T (area\timestime). For efficient implementation from either time or utilized resources, our area-time efficient architecture can establish almost 7,000 key sessions per second which is 64% faster than the previous works. The area-time efficient architecture uses well scheduled interleaved multiplication combined with a reduction algorithm. Additionally, we offer a fast architecture for high performance applications based on the 4-level Karatsuba method and Carry-Compact Addition (CCA). Our high-performance architecture also outperforms previous work in terms of A\cdot T. The results show 9% and 29% improvement in A\cdot T and A_{d}\cdot T (DSP_count\timestime), respectively. All architectures are variable-base-point implemented on the Xilinx Zynq-7020 FPGA family where performance and implementation metrics are reported and compared. Finally, various side-channel attack countermeasures are embedded in the proposed architectures

Cryptology ePrint Archive

PQC Cloudization: Rapid Prototyping of Scalable NTT/INTT Architecture to Accelerate Kyber

Author: Anjana Parthasarathy
Bharat Pillilli
Blake Pelton
Bryan Kelly
Daniel Lo
Mojtaba Bisheh-Niasar
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 05/07/2023
Field of study

The advent of quantum computers poses a serious challenge to the security of cloud infrastructures and services, as they can potentially break the existing public-key cryptosystems, such as Rivest–Shamir–Adleman (RSA) and Elliptic Curve Cryptography (ECC). Even though the gap between today’s quantum computers and the threats they pose to current public-key cryptography is large, the cloud landscape should act proactively and initiate the transition to the post-quantum era as early as possible. To comply with that, the U.S. government issued a National Security Memorandum in May 2022 that mandated federal agencies to migrate to post-quantum cryptosystems (PQC) by 2035. To ensure the long-term security of cloud computing, it is imperative to develop and deploy PQC resistant to quantum attacks. A promising class of post-quantum cryptosystems is based on lattice problems, which require polynomial arithmetic. In this paper, we propose and implement a scalable number-theoretic transform (NTT) architecture that significantly enhances the performance of polynomial multiplication. Our proposed design exploits multi-levels of parallelism to accelerate the NTT computation on reconfigurable hardware. We use the high-level synthesis (HLS) method to implement our design, which allows us to describe the NTT algorithm in a high-level language and automatically generate optimized hardware code. HLS facilitates rapid prototyping and enables us to explore different design spaces and trade-offs on the hardware platforms. Our experimental results show that our design achieves 11

\times

speedup compared to the state-of-the-art requiring only 14 clock cycles for an NTT computation over a polynomial of degree 256. To demonstrate the applicability of our design, we also present a coprocessor architecture for Kyber, a key encapsulation mechanism (KEM) chosen by the NIST post-quantum standardization process, that utilizes our scalable NTT core

Cryptology ePrint Archive