184 research outputs found

    Hardware-Software Codesign of a Vector Co-processor for Public Key Cryptography

    Get PDF
    International audienceUntil now, most cryptography implementations on parallel architectures have focused on adapting the software to SIMD architectures initially meant for media applications. In this paper, we review some of the most significant contributions in this area. We then propose a vector architecture to efficiently implement long precision modular multiplications. Having such a data level parallel hardware provides a circuit whose decode and schedule units are at least of the same complexity as those of a scalar processor. The excess transistors are mainly found in the data path. Moreover, the vector approach gives a very modular architecture where resources can be easily redefined. We built a functional simulator onto which we performed a quantitative analysis to study how the resizing of those resources affects the performance of the modular multiplication operation. Hence we not only propose a vector architecture for our Public Key cryptographic operations but also show how we can analyze the impact of design choices on performance. The proposed architecture is also flexible in the sense that the software running on it would offer room for the implementation of counter-measures against side-channel or fault attacks

    HLS-based HW/SW co-design of the post-quantum classic McEliece cryptosystem

    Get PDF
    While quantum computers are rapidly becoming more powerful, the current cryptographic infrastructure is imminently threatened. In a preventive manner, the U.S. National Institute of Standards and Technology (NIST) has initiated a process to evaluate quantum-resistant cryptosystems, to form the first post-quantum (PQ) cryptographic standard. Classic McEliece (CM) is one of the most prominent cryptosystems considered for standardization in NIST’s PQ cryptography contest. However, its computational cost poses notable challenges to a big fraction of existing computing devices. This work presents an HLS-based, HW/SW co-design acceleration of the CM Key Encapsulation Mechanism (CM KEM). We demonstrate significant maximum speedups of up to 55.2 ×, 3.3 ×, and 8.7 × in the CM KEM algorithms of key generation, encapsulation, and decapsulation respectively, comparing to a SW-only scalar implementation.This research was supported by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of the total cost eligible, under the DRAC project [001- P-001723]. It was also supported by the Spanish goverment (grant RTI2018-095094-B-C21 “CONSENT”), by the Spanish Ministry of Science and Innovation (contracts PID2019- 107255GB-C21, PID2019-107255GB-C21) and by the Catalan Government (contracts 2017-SGR-1414, 2017-SGR-705). This work has also received funding from the European Union Horizon 2020 research and innovation programme under grant agreement No. 871467. V. Kostalabros has been partially supported by the Agency for Management of University and Research Grants (AGAUR) of the Government of Catalonia under "Ajuts per a la contractació de personal investigador novell" fellowship No. 2019FI B01274. M. Moreto was also partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under "Ramón y Cajal" fellowship No. RYC-2016-21104.Peer ReviewedPostprint (author's final draft

    Education and Research Integration of Emerging Multidisciplinary Medical Devices Security

    Get PDF
    Traditional embedded systems such as secure smart cards and nano-sensor networks have been utilized in various usage models. Nevertheless, emerging secure deeply-embedded systems, e.g., implantable and wearable medical devices, have comparably larger “attack surface”. Specifically, with respect to medical devices, a security breach can be life-threatening (for which adopting traditional solutions might not be practical due to tight constraints of these often-battery-powered systems), and unlike traditional embedded systems, it is not only a matter of financial loss. Unfortunately, although emerging cryptographic engineering research mechanisms for such deeply-embedded systems have started solving this critical, vital problem, university education (at both graduate and undergraduate level) lags comparably. One of the pivotal reasons for such a lag is the multi-disciplinary nature of the emerging security bottlenecks. Based on the aforementioned motivation, in this work, at Rochester Institute of Technology, we present an effective research and education integration strategy to overcome this issue in one of the most critical deeply-embedded systems, i.e., medical devices. Moreover, we present the results of two years of implementation of the presented strategy at graduate-level through fault analysis attacks, a variant of side-channel attacks. We note that the authors also supervise an undergraduate student and the outcome of the presented work has been assessed for that student as well; however, the emphasis is on graduate-level integration. The results of the presented work show the success of the presented methodology while pinpointing the challenges encountered compared to traditional embedded system security research/teaching integration of medical devices security. We would like to emphasize that our integration approaches are general and scalable to other critical infrastructures as well

    Multidisciplinary Approaches and Challenges in Integrating Emerging Medical Devices Security Research and Education

    Get PDF
    Traditional embedded systems such as secure smart cards and nano-sensor networks have been utilized in various usage models. Nevertheless, emerging secure deeply-embedded systems, e.g., implantable and wearable medical devices, have comparably larger “attack surface”. Specifically, with respect to medical devices, a security breach can be life-threatening (for which adopting traditional solutions might not be practical due to tight constraints of these often-battery-powered systems), and unlike traditional embedded systems, it is not only a matter of financial loss. Unfortunately, although emerging cryptographic engineering research mechanisms for such deeply-embedded systems have started solving this critical, vital problem, university education (at both graduate and undergraduate level) lags comparably. One of the pivotal reasons for such a lag is the multi-disciplinary nature of the emerging security bottlenecks. Based on the aforementioned motivation, in this work, at Rochester Institute of Technology, we present an effective research and education integration strategy to overcome this issue in one of the most critical deeply-embedded systems, i.e., medical devices. Moreover, we present the results of two years of implementation of the presented strategy at graduate-level through fault analysis attacks, a variant of side-channel attacks. We note that the authors also supervise an undergraduate student and the outcome of the presented work has been assessed for that student as well; however, the emphasis is on graduate-level integration. The results of the presented work show the success of the presented methodology while pinpointing the challenges encountered compared to traditional embedded system security research/teaching integration of medical devices security. We would like to emphasize that our integration approaches are general and scalable to other critical infrastructures as well

    Fast and Efficient Hardware Implementation of HQC

    Get PDF
    This work presents a hardware design for constant-time implementation of the HQC (Hamming Quasi-Cyclic) code-based key encapsulation mechanism. HQC has been selected for the fourth round of NIST\u27s Post-Quantum Cryptography standardization process and this work presents the first, hand-optimized design of HQC key generation, encapsulation, and decapsulation written in Verilog targeting implementation on FPGAs. The three modules further share a common SHAKE256 hash module to reduce area overhead. All the hardware modules are parametrizable at compile time so that designs for the different security levels can be easily generated. The design currently outperforms the other hardware designs for HQC, and many of the fourth-round Post-Quantum Cryptography standardization process, with one of the best time-area products as well. For the combined HighSpeed design targeting the lowest security level, we show that the HQC design can perform key generation in 0.09ms, encapsulation in 0.13ms, and decapsulation in 0.21ms when synthesized for an Xilinx Artix 7 FPGA. Our work shows that when hardware performance is compared, HQC can be a competitive alternative candidate from the fourth round of the NIST PQC competition

    Post-Quantum Signatures on RISC-V with Hardware Acceleration

    Get PDF
    CRYSTALS-Dilithium and Falcon are digital signature algorithms based on cryptographic lattices, that are considered secure even if large-scale quantum computers will be able to break conventional public-key cryptography. Both schemes have been selected for standardization in the NIST post-quantum competition. In this work, we present a RISC-V HW/SW odesign that aims to combine the advantages of software- and hardware implementations, i.e. flexibility and performance. It shows the use of lexible hardware accelerators, which have been previously used for Public-Key Encryption (PKE) and Key-Encapsulation Mechanism (KEM), for post-quantum signatures. It is optimized for Dilithium as a generic signature cheme but also accelerates applications that require fast verification of Falcon’s compact signatures. We provide a comparison with previous works showing that for Dilithium and Falcon, cycle counts are significantly reduced, such that our design is faster than previous software implementations or other HW/SW codesigns. In addition to that, we present a compact Globalfoundries 22 nm ASIC design that runs at 800MHz. By using hardware acceleration, energy consumption for Dilithium is reduced by up to 92.2%, and up to 67.5% for Falcon’s signature verification

    High-Speed NTT-based Polynomial Multiplication Accelerator for CRYSTALS-Kyber Post-Quantum Cryptography

    Get PDF
    This paper demonstrates an architecture for accelerating the polynomial multiplication using number theoretic transform (NTT). Kyber is one of the finalists in the third round of the NIST post-quantum cryptography standardization process. Simultaneously, the performance of NTT execution is its main challenge, requiring large memory and complex memory access pattern. In this paper, an efficient NTT architecture is presented to improve the respective computation time. We propose several optimization strategies for efficiency improvement targeting different performance requirements for various applications. Our NTT architecture, including four butterfly cores, occupies only 798 LUTs and 715 FFs on a small Artix-7 FPGA, showing more than 44% improvement compared to the best previous work. We also implement a coprocessor architecture for Kyber KEM benefiting from our high-speed NTT core to accomplish three phases of the key exchange in 9, 12, and 19 \mus, respectively, operating at 200 MHz

    Compact domain-specific co-processor for accelerating module lattice-based key encapsulation mechanism

    Get PDF
    We present a domain-specific co-processor to speed up Saber, a post-quantum key encapsulation mechanism competing on the NIST Post-Quantum Cryptography standardization process. Contrary to most lattice-based schemes, Saber doesn’t use NTT-based polynomial multiplication. We follow a hardware-software co-design approach: the execution is performed on an ARM core and only the most computationally expensive operation, i.e., polynomial multiplication, is offloaded to the co-processor to obtain a compact design. We exploit the idea of distributed computing at micro-architectural level together with novel algorithmic optimizations to achieve approximately a 6 times speedup with respect to optimized software at a small area cost

    Implementation and Benchmarking of Round 2 Candidates in the NIST Post-Quantum Cryptography Standardization Process Using Hardware and Software/Hardware Co-design Approaches

    Get PDF
    Performance in hardware has typically played a major role in differentiating among leading candidates in cryptographic standardization efforts. Winners of two past NIST cryptographic contests (Rijndael in case of AES and Keccak in case of SHA-3) were ranked consistently among the two fastest candidates when implemented using FPGAs and ASICs. Hardware implementations of cryptographic operations may quite easily outperform software implementations for at least a subset of major performance metrics, such as speed, power consumption, and energy usage, as well as in terms of security against physical attacks, including side-channel analysis. Using hardware also permits much higher flexibility in trading one subset of these properties for another. A large number of candidates at the early stages of the standardization process makes the accurate and fair comparison very challenging. Nevertheless, in all major past cryptographic standardization efforts, future winners were identified quite early in the evaluation process and held their lead until the standard was selected. Additionally, identifying some candidates as either inherently slow or costly in hardware helped to eliminate a subset of candidates, saving countless hours of cryptanalysis. Finally, early implementations provided a baseline for future design space explorations, paving a way to more comprehensive and fairer benchmarking at the later stages of a given cryptographic competition. In this paper, we first summarize, compare, and analyze results reported by other groups until mid-May 2020, i.e., until the end of Round 2 of the NIST PQC process. We then outline our own methodology for implementing and benchmarking PQC candidates using both hardware and software/hardware co-design approaches. We apply our hardware approach to 6 lattice-based CCA-secure Key Encapsulation Mechanisms (KEMs), representing 4 NIST PQC submissions. We then apply a software-hardware co-design approach to 12 lattice-based CCA-secure KEMs, representing 8 Round 2 submissions. We hope that, combined with results reported by other groups, our study will provide NIST with helpful information regarding the relative performance of a significant subset of Round 2 PQC candidates, assuming that at least their major operations, and possibly the entire algorithms, are off-loaded to hardware

    Hardware Architectures for Post-Quantum Cryptography

    Get PDF
    The rapid development of quantum computers poses severe threats to many commonly-used cryptographic algorithms that are embedded in different hardware devices to ensure the security and privacy of data and communication. Seeking for new solutions that are potentially resistant against attacks from quantum computers, a new research field called Post-Quantum Cryptography (PQC) has emerged, that is, cryptosystems deployed in classical computers conjectured to be secure against attacks utilizing large-scale quantum computers. In order to secure data during storage or communication, and many other applications in the future, this dissertation focuses on the design, implementation, and evaluation of efficient PQC schemes in hardware. Four PQC algorithms, each from a different family, are studied in this dissertation. The first hardware architecture presented in this dissertation is focused on the code-based scheme Classic McEliece. The research presented in this dissertation is the first that builds the hardware architecture for the Classic McEliece cryptosystem. This research successfully demonstrated that complex code-based PQC algorithm can be run efficiently on hardware. Furthermore, this dissertation shows that implementation of this scheme on hardware can be easily tuned to different configurations by implementing support for flexible choices of security parameters as well as configurable hardware performance parameters. The successful prototype of the Classic McEliece scheme on hardware increased confidence in this scheme, and helped Classic McEliece to get recognized as one of seven finalists in the third round of the NIST PQC standardization process. While Classic McEliece serves as a ready-to-use candidate for many high-end applications, PQC solutions are also needed for low-end embedded devices. Embedded devices play an important role in our daily life. Despite their typically constrained resources, these devices require strong security measures to protect them against cyber attacks. Towards securing this type of devices, the second research presented in this dissertation focuses on the hash-based digital signature scheme XMSS. This research is the first that explores and presents practical hardware based XMSS solution for low-end embedded devices. In the design of XMSS hardware, a heterogenous software-hardware co-design approach was adopted, which combined the flexibility of the soft core with the acceleration from the hard core. The practicability and efficiency of the XMSS software-hardware co-design is further demonstrated by providing a hardware prototype on an open-source RISC-V based System-on-a-Chip (SoC) platform. The third research direction covered in this dissertation focuses on lattice-based cryptography, which represents one of the most promising and popular alternatives to today\u27s widely adopted public key solutions. Prior research has presented hardware designs targeting the computing blocks that are necessary for the implementation of lattice-based systems. However, a recurrent issue in most existing designs is that these hardware designs are not fully scalable or parameterized, hence limited to specific cryptographic primitives and security parameter sets. The research presented in this dissertation is the first that develops hardware accelerators that are designed to be fully parameterized to support different lattice-based schemes and parameters. Further, these accelerators are utilized to realize the first software-harware co-design of provably-secure instances of qTESLA, which is a lattice-based digital signature scheme. This dissertation demonstrates that even demanding, provably-secure schemes can be realized efficiently with proper use of software-hardware co-design. The final research presented in this dissertation is focused on the isogeny-based scheme SIKE, which recently made it to the final round of the PQC standardization process. This research shows that hardware accelerators can be designed to offload compute-intensive elliptic curve and isogeny computations to hardware in a versatile fashion. These hardware accelerators are designed to be fully parameterized to support different security parameter sets of SIKE as well as flexible hardware configurations targeting different user applications. This research is the first that presents versatile hardware accelerators for SIKE that can be mapped efficiently to both FPGA and ASIC platforms. Based on these accelerators, an efficient software-hardwareco-design is constructed for speeding up SIKE. In the end, this dissertation demonstrates that, despite being embedded with expensive arithmetic, the isogeny-based SIKE scheme can be run efficiently by exploiting specialized hardware. These four research directions combined demonstrate the practicability of building efficient hardware architectures for complex PQC algorithms. The exploration of efficient PQC solutions for different hardware platforms will eventually help migrate high-end servers and low-end embedded devices towards the post-quantum era
    • …
    corecore