36 research outputs found

    Hardware Architectures for Post-Quantum Cryptography

    Get PDF
    The rapid development of quantum computers poses severe threats to many commonly-used cryptographic algorithms that are embedded in different hardware devices to ensure the security and privacy of data and communication. Seeking for new solutions that are potentially resistant against attacks from quantum computers, a new research field called Post-Quantum Cryptography (PQC) has emerged, that is, cryptosystems deployed in classical computers conjectured to be secure against attacks utilizing large-scale quantum computers. In order to secure data during storage or communication, and many other applications in the future, this dissertation focuses on the design, implementation, and evaluation of efficient PQC schemes in hardware. Four PQC algorithms, each from a different family, are studied in this dissertation. The first hardware architecture presented in this dissertation is focused on the code-based scheme Classic McEliece. The research presented in this dissertation is the first that builds the hardware architecture for the Classic McEliece cryptosystem. This research successfully demonstrated that complex code-based PQC algorithm can be run efficiently on hardware. Furthermore, this dissertation shows that implementation of this scheme on hardware can be easily tuned to different configurations by implementing support for flexible choices of security parameters as well as configurable hardware performance parameters. The successful prototype of the Classic McEliece scheme on hardware increased confidence in this scheme, and helped Classic McEliece to get recognized as one of seven finalists in the third round of the NIST PQC standardization process. While Classic McEliece serves as a ready-to-use candidate for many high-end applications, PQC solutions are also needed for low-end embedded devices. Embedded devices play an important role in our daily life. Despite their typically constrained resources, these devices require strong security measures to protect them against cyber attacks. Towards securing this type of devices, the second research presented in this dissertation focuses on the hash-based digital signature scheme XMSS. This research is the first that explores and presents practical hardware based XMSS solution for low-end embedded devices. In the design of XMSS hardware, a heterogenous software-hardware co-design approach was adopted, which combined the flexibility of the soft core with the acceleration from the hard core. The practicability and efficiency of the XMSS software-hardware co-design is further demonstrated by providing a hardware prototype on an open-source RISC-V based System-on-a-Chip (SoC) platform. The third research direction covered in this dissertation focuses on lattice-based cryptography, which represents one of the most promising and popular alternatives to today\u27s widely adopted public key solutions. Prior research has presented hardware designs targeting the computing blocks that are necessary for the implementation of lattice-based systems. However, a recurrent issue in most existing designs is that these hardware designs are not fully scalable or parameterized, hence limited to specific cryptographic primitives and security parameter sets. The research presented in this dissertation is the first that develops hardware accelerators that are designed to be fully parameterized to support different lattice-based schemes and parameters. Further, these accelerators are utilized to realize the first software-harware co-design of provably-secure instances of qTESLA, which is a lattice-based digital signature scheme. This dissertation demonstrates that even demanding, provably-secure schemes can be realized efficiently with proper use of software-hardware co-design. The final research presented in this dissertation is focused on the isogeny-based scheme SIKE, which recently made it to the final round of the PQC standardization process. This research shows that hardware accelerators can be designed to offload compute-intensive elliptic curve and isogeny computations to hardware in a versatile fashion. These hardware accelerators are designed to be fully parameterized to support different security parameter sets of SIKE as well as flexible hardware configurations targeting different user applications. This research is the first that presents versatile hardware accelerators for SIKE that can be mapped efficiently to both FPGA and ASIC platforms. Based on these accelerators, an efficient software-hardwareco-design is constructed for speeding up SIKE. In the end, this dissertation demonstrates that, despite being embedded with expensive arithmetic, the isogeny-based SIKE scheme can be run efficiently by exploiting specialized hardware. These four research directions combined demonstrate the practicability of building efficient hardware architectures for complex PQC algorithms. The exploration of efficient PQC solutions for different hardware platforms will eventually help migrate high-end servers and low-end embedded devices towards the post-quantum era

    Envisioning the Future of Cyber Security in Post-Quantum Era: A Survey on PQ Standardization, Applications, Challenges and Opportunities

    Full text link
    The rise of quantum computers exposes vulnerabilities in current public key cryptographic protocols, necessitating the development of secure post-quantum (PQ) schemes. Hence, we conduct a comprehensive study on various PQ approaches, covering the constructional design, structural vulnerabilities, and offer security assessments, implementation evaluations, and a particular focus on side-channel attacks. We analyze global standardization processes, evaluate their metrics in relation to real-world applications, and primarily focus on standardized PQ schemes, selected additional signature competition candidates, and PQ-secure cutting-edge schemes beyond standardization. Finally, we present visions and potential future directions for a seamless transition to the PQ era

    Compact GF(2) systemizer and optimized constant-time hardware sorters for Key Generation in Classic McEliece

    Get PDF
    Classic McEliece is a code-based quantum-resistant public-key scheme characterized with relative high encapsulation/decapsulation speed and small cipher- texts, with an in-depth analysis on its security. However, slow key generation with large public key size make it hard for wider applications. Based on this observation, a high-throughput key generator in hardware, is proposed to accelerate the key generation in Classic McEliece based on algorithm-hardware co-design. Meanwhile the storage overhead caused by large-size keys is also minimized. First, compact large-size GF(2) Gauss elimination is presented by adopting naive processing array, singular matrix detection-based early abort, and memory-friendly scheduling strategy. Second, an optimized constant-time hardware sorter is proposed to support regular memory accesses with less comparators and storage. Third, algorithm-level pipeline is enabled for high-throughput processing, allowing for concurrent key generation based on decoupling between data access and computation

    Hardware Implementations of Scalable and Unified Elliptic Curve Cryptosystem Processors

    Get PDF
    As the amount of information exchanged through the network grows, so does the demand for increased security over the transmission of this information. As the growth of computers increased in the past few decades, more sophisticated methods of cryptography have been developed. One method of transmitting data securely over the network is by using symmetric-key cryptography. However, a drawback of symmetric-key cryptography is the need to exchange the shared key securely. One of the solutions is to use public-key cryptography. One of the modern public-key cryptography algorithms is called Elliptic Curve Cryptography (ECC). The advantage of ECC over some older algorithms is the smaller number of key sizes to provide a similar level of security. As a result, implementations of ECC are much faster and consume fewer resources. In order to achieve better performance, ECC operations are often offloaded onto hardware to alleviate the workload from the servers' processors. The most important and complex operation in ECC schemes is the elliptic curve point multiplication (ECPM). This thesis explores the implementation of hardware accelerators that offload the ECPM operation to hardware. These processors are referred to as ECC processors, or simply ECPs. This thesis targets the efficient hardware implementation of ECPs specifically for the 15 elliptic curves recommended by the National Institute of Standards and Technology (NIST). The main contribution of this thesis is the implementation of highly efficient hardware for scalable and unified finite field arithmetic units that are used in the design of ECPs. In this thesis, scalability refers to the processor's ability to support multiple key sizes without the need to reconfigure the hardware. By doing so, the hardware does not need to be redesigned for the server to handle different levels of security. Unified refers to the ability of the ECP to handle both prime and binary fields. The resultant designs are valuable to the research community and industry, as a single hardware device is able to handle a wide range of ECC operations efficiently and at high speeds. Thus, improving the ability of network servers to handle secure transaction more quickly and improve productivity at lower costs

    Oil and Vinegar: Modern Parameters and Implementations

    Get PDF
    Two multivariate digital signature schemes, Rainbow and GeMSS, made it into the third round of the NIST PQC competition. However, either made its way to being a standard due to devastating attacks (in one case by Beullens, the other by Tao, Petzoldt, and Ding). How should multivariate cryptography recover from this blow? We propose that, rather than trying to fix Rainbow and HFEv- by introducing countermeasures, the better approach is to return to the classical Oil and Vinegar scheme. We show that, if parametrized appropriately, Oil and Vinegar still provides competitive performance compared to the new NIST standards by most measures (except for key size). At NIST security level 1, this results in either 128-byte signatures with 44 kB public keys or 96-byte signatures with 67 kB public keys. We revamp the state-of-the-art of Oil and Vinegar implementations for the Intel/AMD AVX2, the Arm Cortex-M4 microprocessor, the Xilinx Artix-7 FPGA, and the Armv8-A microarchitecture with the Neon vector instructions set

    Efficiency and Implementation Security of Code-based Cryptosystems

    Get PDF
    This thesis studies efficiency and security problems of implementations of code-based cryptosystems. These cryptosystems, though not currently used in the field, are of great scientific interest, since no quantum algorithm is known that breaks them essentially faster than any known classical algorithm. This qualifies them as cryptographic schemes for the quantum-computer era, where the currently used cryptographic schemes are rendered insecure. Concerning the efficiency of these schemes, we propose a solution for the handling of the public keys, which are, compared to the currently used schemes, of an enormous size. Here, the focus lies on resource-constrained devices, which are not capable of storing a code-based public key of communication partner in their volatile memory. Furthermore, we show a solution for the decryption without the parity check matrix with a passable speed penalty. This is also of great importance, since this matrix is of a size that is comparable to that of the public key. Thus, the employment of this matrix on memory-constrained devices is not possible or incurs a large cost. Subsequently, we present an analysis of improvements to the generally most time-consuming part of the decryption operation, which is the determination of the roots of the error locator polynomial. We compare a number of known algorithmic variants and new combinations thereof in terms of running time and memory demands. Though the speed of pure software implementations must be seen as one of the strong sides of code-based schemes, the optimisation of their running time on resource-constrained devices and servers is of great relevance. The second essential part of the thesis studies the side channel security of these schemes. A side channel vulnerability is given when an attacker is able to retrieve information about the secrets involved in a cryptographic operation by measuring physical quantities such as the running time or the power consumption during that operation. Specifically, we consider attacks on the decryption operation, which either target the message or the secret key. In most cases, concrete countermeasures are proposed and evaluated. In this context, we show a number of timing vulnerabilities that are linked to the algorithmic variants for the root-finding of the error locator polynomial mentioned above. Furthermore, we show a timing attack against a vulnerability in the Extended Euclidean Algorithm that is used to solve the so-called key equation during the decryption operation, which aims at the recovery of the message. We also present a related practical power analysis attack. Concluding, we present a practical timing attack that targets the secret key, which is based on the combination of three vulnerabilities, located within the syndrome inversion, a further suboperation of the decryption, and the already mentioned solving of the key equation. We compare the attacks that aim at the recovery of the message with the analogous attacks against the RSA cryptosystem and derive a general methodology for the discovery of the underlying vulnerabilities in cryptosystems with specific properties. Furthermore, we present two implementations of the code-based McEliece cryptosystem: a smart card implementation and flexible implementation, which is based on a previous open-source implementation. The previously existing open-source implementation was extended to be platform independent and optimised for resource-constrained devices. In addition, we added all algorithmic variants presented in this thesis, and we present all relevant performance data such as running time, code size and memory consumption for these variants on an embedded platform. Moreover, we implemented all side channel countermeasures developed in this work. Concluding, we present open research questions, which will become relevant once efficient and secure implementations of code-based cryptosystems are evaluated by the industry for an actual application

    Post-quantum cryptosystems for internet-of-things: A survey on lattice-based algorithms

    Get PDF
    The latest quantum computers have the ability to solve incredibly complex classical cryptography equations particularly to decode the secret encrypted keys and making the network vulnerable to hacking. They can solve complex mathematical problems almost instantaneously compared to the billions of years of computation needed by traditional computing machines. Researchers advocate the development of novel strategies to include data encryption in the post-quantum era. Lattices have been widely used in cryptography, somewhat peculiarly, and these algorithms have been used in both; (a) cryptoanalysis by using lattice approximation to break cryptosystems; and (b) cryptography by using computationally hard lattice problems (non-deterministic polynomial time hardness) to construct stable cryptographic functions. Most of the dominant features of lattice-based cryptography (LBC), which holds it ahead in the post-quantum league, include resistance to quantum attack vectors, high concurrent performance, parallelism, security under worst-case intractability assumptions, and solutions to long-standing open problems in cryptography. While these methods offer possible security for classical cryptosytems in theory and experimentation, their implementation in energy-restricted Internet-of-Things (IoT) devices requires careful study of regular lattice-based implantation and its simplification in lightweight lattice-based cryptography (LW-LBC). This streamlined post-quantum algorithm is ideal for levelled IoT device security. The key aim of this survey was to provide the scientific community with comprehensive information on elementary mathematical facts, as well as to address real-time implementation, hardware architecture, open problems, attack vectors, and the significance for the IoT networks

    Design of secure and trustworthy system-on-chip architectures using hardware-based root-of-trust techniques

    Get PDF
    Cyber-security is now a critical concern in a wide range of embedded computing modules, communications systems, and connected devices. These devices are used in medical electronics, automotive systems, power grid systems, robotics, and avionics. The general consensus today is that conventional approaches and software-only schemes are not sufficient to provide desired security protections and trustworthiness. Comprehensive hardware-software security solutions so far have remained elusive. One major challenge is that in current system-on-chip (SoCs) designs, processing elements (PEs) and executable codes with varying levels of trust, are all integrated on the same computing platform to share resources. This interdependency of modules creates a fertile attack ground and represents the Achilles’ heel of heterogeneous SoC architectures. The salient research question addressed in this dissertation is “can one design a secure computer system out of non-secure or untrusted computing IP components and cores?”. In response to this question, we establish a generalized, user/designer-centric set of design principles which intend to advance the construction of secure heterogeneous multi-core computing systems. We develop algorithms, models of computation, and hardware security primitives to integrate secure and non-secure processing elements into the same chip design while aiming for: (a) maintaining individual core’s security; (b) preventing data leakage and corruption; (c) promoting data and resource sharing among the cores; and (d) tolerating malicious behaviors from untrusted processing elements and software applications. The key contributions of this thesis are: 1. The introduction of a new architectural model for integrating processing elements with different security and trust levels, i.e., secure and non-secure cores with trusted and untrusted provenances; 2. A generalized process isolation design methodology for the new architecture model that covers both the software and hardware layers to (i) create hardware-assisted virtual logical zones, and (ii) perform both static and runtime security, privilege level and trust authentication checks; 3. A set of secure protocols and hardware root-of-trust (RoT) primitives to support the process isolation design and to provide the following functionalities: (i) hardware immutable identities – using physical unclonable functions, (ii) core hijacking and impersonation resistance – through a blind signature scheme, (iii) threshold-based data access control – with a robust and adaptive secure secret sharing algorithm, (iv) privacy-preserving authorization verification – by proposing a group anonymous authentication algorithm, and (v) denial of resource or denial of service attack avoidance – by developing an interconnect network routing algorithm and a memory access mechanism according to user-defined security policies. 4. An evaluation of the security of the proposed hardware primitives in the post-quantum era, and possible extensions and algorithmic modifications for their post-quantum resistance. In this dissertation, we advance the practicality of secure-by-construction methodologies in SoC architecture design. The methodology allows for the use of unsecured or untrusted processing elements in the construction of these secure architectures and tries to extend their effectiveness into the post-quantum computing era

    Implementation and Benchmarking of Round 2 Candidates in the NIST Post-Quantum Cryptography Standardization Process Using Hardware and Software/Hardware Co-design Approaches

    Get PDF
    Performance in hardware has typically played a major role in differentiating among leading candidates in cryptographic standardization efforts. Winners of two past NIST cryptographic contests (Rijndael in case of AES and Keccak in case of SHA-3) were ranked consistently among the two fastest candidates when implemented using FPGAs and ASICs. Hardware implementations of cryptographic operations may quite easily outperform software implementations for at least a subset of major performance metrics, such as speed, power consumption, and energy usage, as well as in terms of security against physical attacks, including side-channel analysis. Using hardware also permits much higher flexibility in trading one subset of these properties for another. A large number of candidates at the early stages of the standardization process makes the accurate and fair comparison very challenging. Nevertheless, in all major past cryptographic standardization efforts, future winners were identified quite early in the evaluation process and held their lead until the standard was selected. Additionally, identifying some candidates as either inherently slow or costly in hardware helped to eliminate a subset of candidates, saving countless hours of cryptanalysis. Finally, early implementations provided a baseline for future design space explorations, paving a way to more comprehensive and fairer benchmarking at the later stages of a given cryptographic competition. In this paper, we first summarize, compare, and analyze results reported by other groups until mid-May 2020, i.e., until the end of Round 2 of the NIST PQC process. We then outline our own methodology for implementing and benchmarking PQC candidates using both hardware and software/hardware co-design approaches. We apply our hardware approach to 6 lattice-based CCA-secure Key Encapsulation Mechanisms (KEMs), representing 4 NIST PQC submissions. We then apply a software-hardware co-design approach to 12 lattice-based CCA-secure KEMs, representing 8 Round 2 submissions. We hope that, combined with results reported by other groups, our study will provide NIST with helpful information regarding the relative performance of a significant subset of Round 2 PQC candidates, assuming that at least their major operations, and possibly the entire algorithms, are off-loaded to hardware
    corecore