11 research outputs found

    Lightweight Hardware Accelerator for Post-Quantum Digital Signature CRYSTALS-Dilithium

    Get PDF
    The looming threat of an adversary with Quantum computing capability led to a worldwide research effort towards identifying and standardizing novel post-quantum cryptographic primitives. Post-standardization, all existing security protocols will need to support efficient implementation of these primitives. In this work, we contribute to these efforts by reporting the smallest implementation of CRYSTALS-Dilithium, a finalist candidate for post-quantum digital signature. By invoking multiple optimizations to leverage parallelism, pre-computation and memory access sharing, we obtain an implementation that could be fit into one of the smallest Zynq FPGA. On Zynq Ultrascale+, our design achieves an improvement of about 36.7%/35.4%/42.3% in Area×Time (LUTs×s) trade-off for KeyGen/Sign/Verify respectively over state-of-the-art implementation. We also evaluate our design as a co-processor on three different hardware platforms and compare the results with software implementation, thus presenting a detailed evaluation of CRYSTALS-Dilithium targeted for embedded applications. Further, on ASIC using TSMC 65nm technology, our design requires 0.227mm2^2 area and can operate at a frequency of 1.176 GHz. As a result, it only requires 53.7μs/96.9μs/57.7μs for KeyGen/Sign/Verify operation for the best-case scenario

    Hardware design of cryptographic accelerators

    Get PDF
    With the rapid growth of the Internet and digital communications, the volume of sensitive electronic transactions being transferred and stored over and on insecure media has increased dramatically in recent years. The growing demand for cryptographic systems to secure this data, across a multitude of platforms, ranging from large servers to small mobile devices and smart cards, has necessitated research into low cost, flexible and secure solutions. As constraints on architectures such as area, speed and power become key factors in choosing a cryptosystem, methods for speeding up the development and evaluation process are necessary. This thesis investigates flexible hardware architectures for the main components of a cryptographic system. Dedicated hardware accelerators can provide significant performance improvements when compared to implementations on general purpose processors. Each of the designs proposed are analysed in terms of speed, area, power, energy and efficiency. Field Programmable Gate Arrays (FPGAs) are chosen as the development platform due to their fast development time and reconfigurable nature. Firstly, a reconfigurable architecture for performing elliptic curve point scalar multiplication on an FPGA is presented. Elliptic curve cryptography is one such method to secure data, offering similar security levels to traditional systems, such as RSA, but with smaller key sizes, translating into lower memory and bandwidth requirements. The architecture is implemented using different underlying algorithms and coordinates for dedicated Double-and-Add algorithms, twisted Edwards algorithms and SPA secure algorithms, and its power consumption and energy on an FPGA measured. Hardware implementation results for these new algorithms are compared against their software counterparts and the best choices for minimum area-time and area-energy circuits are then identified and examined for larger key and field sizes. Secondly, implementation methods for another component of a cryptographic system, namely hash functions, developed in the recently concluded SHA-3 hash competition are presented. Various designs from the three rounds of the NIST run competition are implemented on FPGA along with an interface to allow fair comparison of the different hash functions when operating in a standardised and constrained environment. Different methods of implementation for the designs and their subsequent performance is examined in terms of throughput, area and energy costs using various constraint metrics. Comparing many different implementation methods and algorithms is nontrivial. Another aim of this thesis is the development of generic interfaces used both to reduce implementation and test time and also to enable fair baseline comparisons of different algorithms when operating in a standardised and constrained environment. Finally, a hardware-software co-design cryptographic architecture is presented. This architecture is capable of supporting multiple types of cryptographic algorithms and is described through an application for performing public key cryptography, namely the Elliptic Curve Digital Signature Algorithm (ECDSA). This architecture makes use of the elliptic curve architecture and the hash functions described previously. These components, along with a random number generator, provide hardware acceleration for a Microblaze based cryptographic system. The trade-off in terms of performance for flexibility is discussed using dedicated software, and hardware-software co-design implementations of the elliptic curve point scalar multiplication block. Results are then presented in terms of the overall cryptographic system

    Microarchitectural Low-Power Design Techniques for Embedded Microprocessors

    Get PDF
    With the omnipresence of embedded processing in all forms of electronics today, there is a strong trend towards wireless, battery-powered, portable embedded systems which have to operate under stringent energy constraints. Consequently, low power consumption and high energy efficiency have emerged as the two key criteria for embedded microprocessor design. In this thesis we present a range of microarchitectural low-power design techniques which enable the increase of performance for embedded microprocessors and/or the reduction of energy consumption, e.g., through voltage scaling. In the context of cryptographic applications, we explore the effectiveness of instruction set extensions (ISEs) for a range of different cryptographic hash functions (SHA-3 candidates) on a 16-bit microcontroller architecture (PIC24). Specifically, we demonstrate the effectiveness of light-weight ISEs based on lookup table integration and microcoded instructions using finite state machines for operand and address generation. On-node processing in autonomous wireless sensor node devices requires deeply embedded cores with extremely low power consumption. To address this need, we present TamaRISC, a custom-designed ISA with a corresponding ultra-low-power microarchitecture implementation. The TamaRISC architecture is employed in conjunction with an ISE and standard cell memories to design a sub-threshold capable processor system targeted at compressed sensing applications. We furthermore employ TamaRISC in a hybrid SIMD/MIMD multi-core architecture targeted at moderate to high processing requirements (> 1 MOPS). A range of different microarchitectural techniques for efficient memory organization are presented. Specifically, we introduce a configurable data memory mapping technique for private and shared access, as well as instruction broadcast together with synchronized code execution based on checkpointing. We then study an inherent suboptimality due to the worst-case design principle in synchronous circuits, and introduce the concept of dynamic timing margins. We show that dynamic timing margins exist in microprocessor circuits, and that these margins are to a large extent state-dependent and that they are correlated to the sequences of instruction types which are executed within the processor pipeline. To perform this analysis we propose a circuit/processor characterization flow and tool called dynamic timing analysis. Moreover, this flow is employed in order to devise a high-level instruction set simulation environment for impact-evaluation of timing errors on application performance. The presented approach improves the state of the art significantly in terms of simulation accuracy through the use of statistical fault injection. The dynamic timing margins in microprocessors are then systematically exploited for throughput improvements or energy reductions via our proposed instruction-based dynamic clock adjustment (DCA) technique. To this end, we introduce a 6-stage 32-bit microprocessor with cycle-by-cycle DCA. Besides a comprehensive design flow and simulation environment for evaluation of the DCA approach, we additionally present a silicon prototype of a DCA-enabled OpenRISC microarchitecture fabricated in 28 nm FD-SOI CMOS. The test chip includes a suitable clock generation unit which allows for cycle-by-cycle DCA over a wide range with fine granularity at frequencies exceeding 1 GHz. Measurement results of speedups and power reductions are provided

    High-Performance VLSI Architectures for Lattice-Based Cryptography

    Get PDF
    Lattice-based cryptography is a cryptographic primitive built upon the hard problems on point lattices. Cryptosystems relying on lattice-based cryptography have attracted huge attention in the last decade since they have post-quantum-resistant security and the remarkable construction of the algorithm. In particular, homomorphic encryption (HE) and post-quantum cryptography (PQC) are the two main applications of lattice-based cryptography. Meanwhile, the efficient hardware implementations for these advanced cryptography schemes are demanding to achieve a high-performance implementation. This dissertation aims to investigate the novel and high-performance very large-scale integration (VLSI) architectures for lattice-based cryptography, including the HE and PQC schemes. This dissertation first presents different architectures for the number-theoretic transform (NTT)-based polynomial multiplication, one of the crucial parts of the fundamental arithmetic for lattice-based HE and PQC schemes. Then a high-speed modular integer multiplier is proposed, particularly for lattice-based cryptography. In addition, a novel modular polynomial multiplier is presented to exploit the fast finite impulse response (FIR) filter architecture to reduce the computational complexity of the schoolbook modular polynomial multiplication for lattice-based PQC scheme. Afterward, an NTT and Chinese remainder theorem (CRT)-based high-speed modular polynomial multiplier is presented for HE schemes whose moduli are large integers

    Quantum-Resistant Security for Software Updates on Low-power Networked Embedded Devices

    Get PDF
    As the Internet of Things (IoT) rolls out today to devices whose lifetime may well exceed a decade, conservative threat models should consider attackers with access to quantum computing power. The SUIT standard (specified by the IETF) defines a security architecture for IoT software updates, standardizing the metadata and the cryptographic tools-namely, digital signatures and hash functions-that guarantee the legitimacy of software updates. While the performance of SUIT has previously been evaluated in the pre-quantum context, it has not yet been studied in a post-quantum context. Taking the open-source implementation of SUIT available in RIOT as a case study, we overview post-quantum considerations, and quantum-resistant digital signatures in particular, focusing on lowpower, microcontroller-based IoT devices which have stringent resource constraints in terms of memory, CPU, and energy consumption. We benchmark a selection of proposed post-quantum signature schemes (LMS, Falcon, and Dilithium) and compare them with current pre-quantum signature schemes (Ed25519 and ECDSA). Our benchmarks are carried out on a variety of IoT hardware including ARM Cortex-M, RISC-V, and Espressif (ESP32), which form the bulk of modern 32-bit microcontroller architectures. We interpret our benchmark results in the context of SUIT, and estimate the real-world impact of post-quantum alternatives for a range of typical software update categories. CCS CONCEPTS • Computer systems organization → Embedded systems

    Efficient and Side-Channel Resistant Implementations of Next-Generation Cryptography

    Get PDF
    The rapid development of emerging information technologies, such as quantum computing and the Internet of Things (IoT), will have or have already had a huge impact on the world. These technologies can not only improve industrial productivity but they could also bring more convenience to people’s daily lives. However, these techniques have “side effects” in the world of cryptography – they pose new difficulties and challenges from theory to practice. Specifically, when quantum computing capability (i.e., logical qubits) reaches a certain level, Shor’s algorithm will be able to break almost all public-key cryptosystems currently in use. On the other hand, a great number of devices deployed in IoT environments have very constrained computing and storage resources, so the current widely-used cryptographic algorithms may not run efficiently on those devices. A new generation of cryptography has thus emerged, including Post-Quantum Cryptography (PQC), which remains secure under both classical and quantum attacks, and LightWeight Cryptography (LWC), which is tailored for resource-constrained devices. Research on next-generation cryptography is of importance and utmost urgency, and the US National Institute of Standards and Technology in particular has initiated the standardization process for PQC and LWC in 2016 and in 2018 respectively. Since next-generation cryptography is in a premature state and has developed rapidly in recent years, its theoretical security and practical deployment are not very well explored and are in significant need of evaluation. This thesis aims to look into the engineering aspects of next-generation cryptography, i.e., the problems concerning implementation efficiency (e.g., execution time and memory consumption) and security (e.g., countermeasures against timing attacks and power side-channel attacks). In more detail, we first explore efficient software implementation approaches for lattice-based PQC on constrained devices. Then, we study how to speed up isogeny-based PQC on modern high-performance processors especially by using their powerful vector units. Moreover, we research how to design sophisticated yet low-area instruction set extensions to further accelerate software implementations of LWC and long-integer-arithmetic-based PQC. Finally, to address the threats from potential power side-channel attacks, we present a concept of using special leakage-aware instructions to eliminate overwriting leakage for masked software implementations (of next-generation cryptography)

    Survey of FPGA applications in the period 2000 – 2015 (Technical Report)

    Get PDF
    Romoth J, Porrmann M, Rückert U. Survey of FPGA applications in the period 2000 – 2015 (Technical Report).; 2017.Since their introduction, FPGAs can be seen in more and more different fields of applications. The key advantage is the combination of software-like flexibility with the performance otherwise common to hardware. Nevertheless, every application field introduces special requirements to the used computational architecture. This paper provides an overview of the different topics FPGAs have been used for in the last 15 years of research and why they have been chosen over other processing units like e.g. CPUs

    Analysis and Design of Symmetric Cryptographic Algorithms

    Get PDF
    This doctoral thesis is dedicated to the analysis and the design of symmetric cryptographic algorithms. In the first part of the dissertation, we deal with fault-based attacks on cryptographic circuits which belong to the field of active implementation attacks and aim to retrieve secret keys stored on such chips. Our main focus lies on the cryptanalytic aspects of those attacks. In particular, we target block ciphers with a lightweight and (often) non-bijective key schedule where the derived subkeys are (almost) independent from each other. An attacker who is able to reconstruct one of the subkeys is thus not necessarily able to directly retrieve other subkeys or even the secret master key by simply reversing the key schedule. We introduce a framework based on differential fault analysis that allows to attack block ciphers with an arbitrary number of independent subkeys and which rely on a substitution-permutation network. These methods are then applied to the lightweight block ciphers LED and PRINCE and we show in both cases how to recover the secret master key requiring only a small number of fault injections. Moreover, we investigate approaches that utilize algebraic instead of differential techniques for the fault analysis and discuss advantages and drawbacks. At the end of the first part of the dissertation, we explore fault-based attacks on the block cipher Bel-T which also has a lightweight key schedule but is not based on a substitution-permutation network but instead on the so-called Lai-Massey scheme. The framework mentioned above is thus not usable against Bel-T. Nevertheless, we also present techniques for the case of Bel-T that enable full recovery of the secret key in a very efficient way using differential fault analysis. In the second part of the thesis, we focus on authenticated encryption schemes. While regular ciphers only protect privacy of processed data, authenticated encryption schemes also secure its authenticity and integrity. Many of these ciphers are additionally able to protect authenticity and integrity of so-called associated data. This type of data is transmitted unencrypted but nevertheless must be protected from being tampered with during transmission. Authenticated encryption is nowadays the standard technique to protect in-transit data. However, most of the currently deployed schemes have deficits and there are many leverage points for improvements. With NORX we introduce a novel authenticated encryption scheme supporting associated data. This algorithm was designed with high security, efficiency in both hardware and software, simplicity, and robustness against side-channel attacks in mind. Next to its specification, we present special features, security goals, implementation details, extensive performance measurements and discuss advantages over currently deployed standards. Finally, we describe our preliminary security analysis where we investigate differential and rotational properties of NORX. Noteworthy are in particular the newly developed techniques for differential cryptanalysis of NORX which exploit the power of SAT- and SMT-solvers and have the potential to be easily adaptable to other encryption schemes as well.Diese Doktorarbeit beschäftigt sich mit der Analyse und dem Entwurf von symmetrischen kryptographischen Algorithmen. Im ersten Teil der Dissertation befassen wir uns mit fehlerbasierten Angriffen auf kryptographische Schaltungen, welche dem Gebiet der aktiven Seitenkanalangriffe zugeordnet werden und auf die Rekonstruktion geheimer Schlüssel abzielen, die auf diesen Chips gespeichert sind. Unser Hauptaugenmerk liegt dabei auf den kryptoanalytischen Aspekten dieser Angriffe. Insbesondere beschäftigen wir uns dabei mit Blockchiffren, die leichtgewichtige und eine (oft) nicht-bijektive Schlüsselexpansion besitzen, bei denen die erzeugten Teilschlüssel voneinander (nahezu) unabhängig sind. Ein Angreifer, dem es gelingt einen Teilschlüssel zu rekonstruieren, ist dadurch nicht in der Lage direkt weitere Teilschlüssel oder sogar den Hauptschlüssel abzuleiten indem er einfach die Schlüsselexpansion umkehrt. Wir stellen Techniken basierend auf differenzieller Fehleranalyse vor, die es ermöglichen Blockchiffren zu analysieren, welche eine beliebige Anzahl unabhängiger Teilschlüssel einsetzen und auf Substitutions-Permutations Netzwerken basieren. Diese Methoden werden im Anschluss auf die leichtgewichtigen Blockchiffren LED und PRINCE angewandt und wir zeigen in beiden Fällen wie der komplette geheime Schlüssel mit einigen wenigen Fehlerinjektionen rekonstruiert werden kann. Darüber hinaus untersuchen wir Methoden, die algebraische statt differenzielle Techniken der Fehleranalyse einsetzen und diskutieren deren Vor- und Nachteile. Am Ende des ersten Teils der Dissertation befassen wir uns mit fehlerbasierten Angriffen auf die Blockchiffre Bel-T, welche ebenfalls eine leichtgewichtige Schlüsselexpansion besitzt jedoch nicht auf einem Substitutions-Permutations Netzwerk sondern auf dem sogenannten Lai-Massey Schema basiert. Die oben genannten Techniken können daher bei Bel-T nicht angewandt werden. Nichtsdestotrotz werden wir auch für den Fall von Bel-T Verfahren vorstellen, die in der Lage sind den vollständigen geheimen Schlüssel sehr effizient mit Hilfe von differenzieller Fehleranalyse zu rekonstruieren. Im zweiten Teil der Doktorarbeit beschäftigen wir uns mit authentifizierenden Verschlüsselungsverfahren. Während gewöhnliche Chiffren nur die Vertraulichkeit der verarbeiteten Daten sicherstellen, gewährleisten authentifizierende Verschlüsselungsverfahren auch deren Authentizität und Integrität. Viele dieser Chiffren sind darüber hinaus in der Lage auch die Authentizität und Integrität von sogenannten assoziierten Daten zu gewährleisten. Daten dieses Typs werden in nicht-verschlüsselter Form übertragen, müssen aber dennoch gegen unbefugte Veränderungen auf dem Transportweg geschützt sein. Authentifizierende Verschlüsselungsverfahren bilden heutzutage die Standardtechnologie um Daten während der Übertragung zu beschützen. Aktuell eingesetzte Verfahren weisen jedoch oftmals Defizite auf und es existieren vielfältige Ansatzpunkte für Verbesserungen. Mit NORX stellen wir ein neuartiges authentifizierendes Verschlüsselungsverfahren vor, welches assoziierte Daten unterstützt. Dieser Algorithmus wurde vor allem im Hinblick auf Einsatzgebiete mit hohen Sicherheitsanforderungen, Effizienz in Hardware und Software, Einfachheit, und Robustheit gegenüber Seitenkanalangriffen entwickelt. Neben der Spezifikation präsentieren wir besondere Eigenschaften, angestrebte Sicherheitsziele, Details zur Implementierung, umfassende Performanz-Messungen und diskutieren Vorteile gegenüber aktuellen Standards. Schließlich stellen wir Ergebnisse unserer vorläufigen Sicherheitsanalyse vor, bei der wir uns vor allem auf differenzielle Merkmale und Rotationseigenschaften von NORX konzentrieren. Erwähnenswert sind dabei vor allem die für die differenzielle Kryptoanalyse von NORX entwickelten Techniken, die auf die Effizienz von SAT- und SMT-Solvern zurückgreifen und das Potential besitzen relativ einfach auch auf andere Verschlüsselungsverfahren übertragen werden zu können

    A Salad of Block Ciphers

    Get PDF
    This book is a survey on the state of the art in block cipher design and analysis. It is work in progress, and it has been for the good part of the last three years -- sadly, for various reasons no significant change has been made during the last twelve months. However, it is also in a self-contained, useable, and relatively polished state, and for this reason I have decided to release this \textit{snapshot} onto the public as a service to the cryptographic community, both in order to obtain feedback, and also as a means to give something back to the community from which I have learned much. At some point I will produce a final version -- whatever being a ``final version\u27\u27 means in the constantly evolving field of block cipher design -- and I will publish it. In the meantime I hope the material contained here will be useful to other people
    corecore