12 research outputs found

    Efficient ASIC Architectures for Low Latency Niederreiter Decryption

    Get PDF
    Post-quantum cryptography addresses the increasing threat that quantum computing poses to modern communication systems. Among the available quantum-resistant systems, the Niederreiter cryptosystem is positioned as a conservative choice with strong security guarantees. As a code-based cryptosystem, the Niederreiter system enables high performance operations and is thus ideally suited for applications such as the acceleration of server workloads. However, until now, no ASIC architecture is available for low latency computation of Niederreiter operations. Therefore, the present work targets the design, implementation and optimization of tailored archi- tectures for low latency Niederreiter decryption. Two architectures utilizing different decoding algorithms are proposed and implemented using a 22nm FDSOI CMOS technology node. One of these optimized architectures improves the decryption latency by 27% compared to a state-of-the-art reference and requires at the same time only 25% of the area

    Algorithmic Security is Insufficient: A Comprehensive Survey on Implementation Attacks Haunting Post-Quantum Security

    Full text link
    This survey is on forward-looking, emerging security concerns in post-quantum era, i.e., the implementation attacks for 2022 winners of NIST post-quantum cryptography (PQC) competition and thus the visions, insights, and discussions can be used as a step forward towards scrutinizing the new standards for applications ranging from Metaverse, Web 3.0 to deeply-embedded systems. The rapid advances in quantum computing have brought immense opportunities for scientific discovery and technological progress; however, it poses a major risk to today's security since advanced quantum computers are believed to break all traditional public-key cryptographic algorithms. This has led to active research on PQC algorithms that are believed to be secure against classical and powerful quantum computers. However, algorithmic security is unfortunately insufficient, and many cryptographic algorithms are vulnerable to side-channel attacks (SCA), where an attacker passively or actively gets side-channel data to compromise the security properties that are assumed to be safe theoretically. In this survey, we explore such imminent threats and their countermeasures with respect to PQC. We provide the respective, latest advancements in PQC research, as well as assessments and providing visions on the different types of SCAs

    Hardware Architectures for Post-Quantum Cryptography

    Get PDF
    The rapid development of quantum computers poses severe threats to many commonly-used cryptographic algorithms that are embedded in different hardware devices to ensure the security and privacy of data and communication. Seeking for new solutions that are potentially resistant against attacks from quantum computers, a new research field called Post-Quantum Cryptography (PQC) has emerged, that is, cryptosystems deployed in classical computers conjectured to be secure against attacks utilizing large-scale quantum computers. In order to secure data during storage or communication, and many other applications in the future, this dissertation focuses on the design, implementation, and evaluation of efficient PQC schemes in hardware. Four PQC algorithms, each from a different family, are studied in this dissertation. The first hardware architecture presented in this dissertation is focused on the code-based scheme Classic McEliece. The research presented in this dissertation is the first that builds the hardware architecture for the Classic McEliece cryptosystem. This research successfully demonstrated that complex code-based PQC algorithm can be run efficiently on hardware. Furthermore, this dissertation shows that implementation of this scheme on hardware can be easily tuned to different configurations by implementing support for flexible choices of security parameters as well as configurable hardware performance parameters. The successful prototype of the Classic McEliece scheme on hardware increased confidence in this scheme, and helped Classic McEliece to get recognized as one of seven finalists in the third round of the NIST PQC standardization process. While Classic McEliece serves as a ready-to-use candidate for many high-end applications, PQC solutions are also needed for low-end embedded devices. Embedded devices play an important role in our daily life. Despite their typically constrained resources, these devices require strong security measures to protect them against cyber attacks. Towards securing this type of devices, the second research presented in this dissertation focuses on the hash-based digital signature scheme XMSS. This research is the first that explores and presents practical hardware based XMSS solution for low-end embedded devices. In the design of XMSS hardware, a heterogenous software-hardware co-design approach was adopted, which combined the flexibility of the soft core with the acceleration from the hard core. The practicability and efficiency of the XMSS software-hardware co-design is further demonstrated by providing a hardware prototype on an open-source RISC-V based System-on-a-Chip (SoC) platform. The third research direction covered in this dissertation focuses on lattice-based cryptography, which represents one of the most promising and popular alternatives to today\u27s widely adopted public key solutions. Prior research has presented hardware designs targeting the computing blocks that are necessary for the implementation of lattice-based systems. However, a recurrent issue in most existing designs is that these hardware designs are not fully scalable or parameterized, hence limited to specific cryptographic primitives and security parameter sets. The research presented in this dissertation is the first that develops hardware accelerators that are designed to be fully parameterized to support different lattice-based schemes and parameters. Further, these accelerators are utilized to realize the first software-harware co-design of provably-secure instances of qTESLA, which is a lattice-based digital signature scheme. This dissertation demonstrates that even demanding, provably-secure schemes can be realized efficiently with proper use of software-hardware co-design. The final research presented in this dissertation is focused on the isogeny-based scheme SIKE, which recently made it to the final round of the PQC standardization process. This research shows that hardware accelerators can be designed to offload compute-intensive elliptic curve and isogeny computations to hardware in a versatile fashion. These hardware accelerators are designed to be fully parameterized to support different security parameter sets of SIKE as well as flexible hardware configurations targeting different user applications. This research is the first that presents versatile hardware accelerators for SIKE that can be mapped efficiently to both FPGA and ASIC platforms. Based on these accelerators, an efficient software-hardwareco-design is constructed for speeding up SIKE. In the end, this dissertation demonstrates that, despite being embedded with expensive arithmetic, the isogeny-based SIKE scheme can be run efficiently by exploiting specialized hardware. These four research directions combined demonstrate the practicability of building efficient hardware architectures for complex PQC algorithms. The exploration of efficient PQC solutions for different hardware platforms will eventually help migrate high-end servers and low-end embedded devices towards the post-quantum era

    Compact GF(2) systemizer and optimized constant-time hardware sorters for Key Generation in Classic McEliece

    Get PDF
    Classic McEliece is a code-based quantum-resistant public-key scheme characterized with relative high encapsulation/decapsulation speed and small cipher- texts, with an in-depth analysis on its security. However, slow key generation with large public key size make it hard for wider applications. Based on this observation, a high-throughput key generator in hardware, is proposed to accelerate the key generation in Classic McEliece based on algorithm-hardware co-design. Meanwhile the storage overhead caused by large-size keys is also minimized. First, compact large-size GF(2) Gauss elimination is presented by adopting naive processing array, singular matrix detection-based early abort, and memory-friendly scheduling strategy. Second, an optimized constant-time hardware sorter is proposed to support regular memory accesses with less comparators and storage. Third, algorithm-level pipeline is enabled for high-throughput processing, allowing for concurrent key generation based on decoupling between data access and computation

    Efficient hardware arithmetic for inverted binary ring-LWE based post-quantum cryptography

    Get PDF
    Ring learning-with-errors (RLWE)-based encryption scheme is a lattice-based cryptographic algorithm that constitutes one of the most promising candidates for Post-Quantum Cryptography (PQC) standardization due to its efficient implementation and low computational complexity. Binary Ring-LWE (BRLWE) is a new optimized variant of RLWE, which achieves smaller computational complexity and higher efficient hardware implementations. In this paper, two efficient architectures based on Linear-Feedback Shift Register (LFSR) for the arithmetic used in Inverted Binary Ring-LWE (InvBRLWE)-based encryption scheme are presented, namely the operation of A center dot B+C over the polynomial ring Zq/(xn+1){Z}_q/(xn+1) . The first architecture optimizes the resource usage for major computation and has a novel input processing setup to speed up the overall processing latency with minimized input loading cycles. The second architecture deploys an innovative serial-in serial-out processing format to reduce the involved area usage further yet maintains a regular input loading time-complexity. Experimental results show that the architectures presented here improve the complexities obtained by competing schemes found in the literature, e.g., involving 71.23% less area-delay product than recent designs. Both architectures are highly efficient in terms of area-time complexities and can be extended for deploying in different lightweight application environments

    VLSI architectures for public key cryptology

    Get PDF

    Efficiency and Implementation Security of Code-based Cryptosystems

    Get PDF
    This thesis studies efficiency and security problems of implementations of code-based cryptosystems. These cryptosystems, though not currently used in the field, are of great scientific interest, since no quantum algorithm is known that breaks them essentially faster than any known classical algorithm. This qualifies them as cryptographic schemes for the quantum-computer era, where the currently used cryptographic schemes are rendered insecure. Concerning the efficiency of these schemes, we propose a solution for the handling of the public keys, which are, compared to the currently used schemes, of an enormous size. Here, the focus lies on resource-constrained devices, which are not capable of storing a code-based public key of communication partner in their volatile memory. Furthermore, we show a solution for the decryption without the parity check matrix with a passable speed penalty. This is also of great importance, since this matrix is of a size that is comparable to that of the public key. Thus, the employment of this matrix on memory-constrained devices is not possible or incurs a large cost. Subsequently, we present an analysis of improvements to the generally most time-consuming part of the decryption operation, which is the determination of the roots of the error locator polynomial. We compare a number of known algorithmic variants and new combinations thereof in terms of running time and memory demands. Though the speed of pure software implementations must be seen as one of the strong sides of code-based schemes, the optimisation of their running time on resource-constrained devices and servers is of great relevance. The second essential part of the thesis studies the side channel security of these schemes. A side channel vulnerability is given when an attacker is able to retrieve information about the secrets involved in a cryptographic operation by measuring physical quantities such as the running time or the power consumption during that operation. Specifically, we consider attacks on the decryption operation, which either target the message or the secret key. In most cases, concrete countermeasures are proposed and evaluated. In this context, we show a number of timing vulnerabilities that are linked to the algorithmic variants for the root-finding of the error locator polynomial mentioned above. Furthermore, we show a timing attack against a vulnerability in the Extended Euclidean Algorithm that is used to solve the so-called key equation during the decryption operation, which aims at the recovery of the message. We also present a related practical power analysis attack. Concluding, we present a practical timing attack that targets the secret key, which is based on the combination of three vulnerabilities, located within the syndrome inversion, a further suboperation of the decryption, and the already mentioned solving of the key equation. We compare the attacks that aim at the recovery of the message with the analogous attacks against the RSA cryptosystem and derive a general methodology for the discovery of the underlying vulnerabilities in cryptosystems with specific properties. Furthermore, we present two implementations of the code-based McEliece cryptosystem: a smart card implementation and flexible implementation, which is based on a previous open-source implementation. The previously existing open-source implementation was extended to be platform independent and optimised for resource-constrained devices. In addition, we added all algorithmic variants presented in this thesis, and we present all relevant performance data such as running time, code size and memory consumption for these variants on an embedded platform. Moreover, we implemented all side channel countermeasures developed in this work. Concluding, we present open research questions, which will become relevant once efficient and secure implementations of code-based cryptosystems are evaluated by the industry for an actual application

    Bit Serial Systolic Architectures for Multiplicative Inversion and Division over GF(2<sup>m</sup>)

    Get PDF
    Systolic architectures are capable of achieving high throughput by maximizing pipelining and by eliminating global data interconnects. Recursive algorithms with regular data flows are suitable for systolization. The computation of multiplicative inversion using algorithms based on EEA (Extended Euclidean Algorithm) are particularly suitable for systolization. Implementations based on EEA present a high degree of parallelism and pipelinability at bit level which can be easily optimized to achieve local data flow and to eliminate the global interconnects which represent most important bottleneck in todays sub-micron design process. The net result is to have high clock rate and performance based on efficient systolic architectures. This thesis examines high performance but also scalable implementations of multiplicative inversion or field division over Galois fields GF(2m) in the specific case of cryptographic applications where field dimension m may be very large (greater than 400) and either m or defining irreducible polynomial may vary. For this purpose, many inversion schemes with different basis representation are studied and most importantly variants of EEA and binary (Stein's) GCD computation implementations are reviewed. A set of common as well as contrasting characteristics of these variants are discussed. As a result a generalized and optimized variant of EEA is proposed which can compute division, and multiplicative inversion as its subset, with divisor in either polynomial or triangular basis representation. Further results regarding Hankel matrix formation for double-basis inversion is provided. The validity of using the same architecture to compute field division with polynomial or triangular basis representation is proved. Next, a scalable unidirectional bit serial systolic array implementation of this proposed variant of EEA is implemented. Its complexity measures are defined and these are compared against the best known architectures. It is shown that assuming the requirements specified above, this proposed architecture may achieve a higher clock rate performance w. r. t. other designs while being more flexible, reliable and with minimum number of inter-cell interconnects. The main contribution at system level architecture is the substitution of all counter or adder/subtractor elements with a simpler distributed and free of carry propagation delays structure. Further a novel restoring mechanism for result sequences of EEA is proposed using a double delay element implementation. Finally, using this systolic architecture a CMD (Combined Multiplier Divider) datapath is designed which is used as the core of a novel systolic elliptic curve processor. This EC processor uses affine coordinates to compute scalar point multiplication which results in having a very small control unit and negligible with respect to the datapath for all practical values of m. The throughput of this EC based on this bit serial systolic architecture is comparable with designs many times larger than itself reported previously

    Hardware Implementations of Scalable and Unified Elliptic Curve Cryptosystem Processors

    Get PDF
    As the amount of information exchanged through the network grows, so does the demand for increased security over the transmission of this information. As the growth of computers increased in the past few decades, more sophisticated methods of cryptography have been developed. One method of transmitting data securely over the network is by using symmetric-key cryptography. However, a drawback of symmetric-key cryptography is the need to exchange the shared key securely. One of the solutions is to use public-key cryptography. One of the modern public-key cryptography algorithms is called Elliptic Curve Cryptography (ECC). The advantage of ECC over some older algorithms is the smaller number of key sizes to provide a similar level of security. As a result, implementations of ECC are much faster and consume fewer resources. In order to achieve better performance, ECC operations are often offloaded onto hardware to alleviate the workload from the servers' processors. The most important and complex operation in ECC schemes is the elliptic curve point multiplication (ECPM). This thesis explores the implementation of hardware accelerators that offload the ECPM operation to hardware. These processors are referred to as ECC processors, or simply ECPs. This thesis targets the efficient hardware implementation of ECPs specifically for the 15 elliptic curves recommended by the National Institute of Standards and Technology (NIST). The main contribution of this thesis is the implementation of highly efficient hardware for scalable and unified finite field arithmetic units that are used in the design of ECPs. In this thesis, scalability refers to the processor's ability to support multiple key sizes without the need to reconfigure the hardware. By doing so, the hardware does not need to be redesigned for the server to handle different levels of security. Unified refers to the ability of the ECP to handle both prime and binary fields. The resultant designs are valuable to the research community and industry, as a single hardware device is able to handle a wide range of ECC operations efficiently and at high speeds. Thus, improving the ability of network servers to handle secure transaction more quickly and improve productivity at lower costs

    A survey of timing channels and countermeasures

    Get PDF
    A timing channel is a communication channel that can transfer information to a receiver/decoder by modulating the timing behavior of an entity. Examples of this entity include the interpacket delays of a packet stream, the reordering packets in a packet stream, or the resource access time of a cryptographic module. Advances in the information and coding theory and the availability of high-performance computing systems interconnected by high-speed networks have spurred interest in and development of various types of timing channels. With the emergence of complex timing channels, novel detection and prevention techniques are also being developed to counter them. In this article, we provide a detailed survey of timing channels broadly categorized into network timing channel, in which communicating entities are connected by a network, and in-system timing channel, in which the communicating entities are within a computing system. This survey builds on the last comprehensive survey by Zander et al. [2007] and considers all three canonical applications of timing channels, namely, covert communication, timing side channel, and network flow watermarking. We survey the theoretical foundations, the implementation, and the various detection and prevention techniques that have been reported in literature. Based on the analysis of the current literature, we discuss potential future research directions both in the design and application of timing channels and their detection and prevention techniques
    corecore