127 research outputs found

    Reconfigurable Architectures for Cryptographic Systems

    No full text
    Field Programmable Gate Arrays (FPGAs) are suitable platforms for implementing cryptographic algorithms in hardware due to their flexibility, good performance and low power consumption. Computer security is becoming increasingly important and security requirements such as key sizes are quickly evolving. This creates the need for customisable hardware designs for cryptographic operations capable of covering a large design space. In this thesis we explore the four design dimensions relevant to cryptography - speed, area, power consumption and security of the crypto-system - by developing parametric designs for public-key generation and encryption as well as side-channel attack countermeasures. There are four contributions. First, we present new architectures for Montgomery multiplication and exponentiation based on variable pipelining and variable serial replication. Our implementations of these architectures are compared to the best implementations in the literature and the design space is explored in terms of speed and area trade-offs. Second, we generalise our Montgomery multiplier design ideas by developing a parametric model to allow rapid optimisation of a general class of algorithms containing loops with dependencies carried from one iteration to the next. By predicting the throughput and the area of the design, our model facilitates and speeds up design space exploration. Third, we develop new architectures for primality testing including the first hardware architecture for the NIST approved Lucas primality test. We explore the area, speed and power consumption trade-offs by comparing our Lucas architectures on CPU, FPGA and ASIC. Finally, we tackle the security issue by presenting two novel power attack countermeasures based on on-chip power monitoring. Our constant power framework uses a closed-loop control system to keep the power consumption of any FPGA implementation constant. Our attack detection framework uses a network of ring-oscillators to detect the insertion of a shunt resistor-based power measurement circuit on a device's power rail. This countermeasure is lightweight and has a relatively low power overhead compared to existing masking and hiding countermeasures

    Remote Attacks on FPGA Hardware

    Get PDF
    Immer mehr Computersysteme sind weltweit miteinander verbunden und über das Internet zugänglich, was auch die Sicherheitsanforderungen an diese erhöht. Eine neuere Technologie, die zunehmend als Rechenbeschleuniger sowohl für eingebettete Systeme als auch in der Cloud verwendet wird, sind Field-Programmable Gate Arrays (FPGAs). Sie sind sehr flexible Mikrochips, die per Software konfiguriert und programmiert werden können, um beliebige digitale Schaltungen zu implementieren. Wie auch andere integrierte Schaltkreise basieren FPGAs auf modernen Halbleitertechnologien, die von Fertigungstoleranzen und verschiedenen Laufzeitschwankungen betroffen sind. Es ist bereits bekannt, dass diese Variationen die Zuverlässigkeit eines Systems beeinflussen, aber ihre Auswirkungen auf die Sicherheit wurden nicht umfassend untersucht. Diese Doktorarbeit befasst sich mit einem Querschnitt dieser Themen: Sicherheitsprobleme die dadurch entstehen wenn FPGAs von mehreren Benutzern benutzt werden, oder über das Internet zugänglich sind, in Kombination mit physikalischen Schwankungen in modernen Halbleitertechnologien. Der erste Beitrag in dieser Arbeit identifiziert transiente Spannungsschwankungen als eine der stärksten Auswirkungen auf die FPGA-Leistung und analysiert experimentell wie sich verschiedene Arbeitslasten des FPGAs darauf auswirken. In der restlichen Arbeit werden dann die Auswirkungen dieser Spannungsschwankungen auf die Sicherheit untersucht. Die Arbeit zeigt, dass verschiedene Angriffe möglich sind, von denen früher angenommen wurde, dass sie physischen Zugriff auf den Chip und die Verwendung spezieller und teurer Test- und Messgeräte erfordern. Dies zeigt, dass bekannte Isolationsmaßnahmen innerhalb FPGAs von böswilligen Benutzern umgangen werden können, um andere Benutzer im selben FPGA oder sogar das gesamte System anzugreifen. Unter Verwendung von Schaltkreisen zur Beeinflussung der Spannung innerhalb eines FPGAs zeigt diese Arbeit aktive Angriffe, die Fehler (Faults) in anderen Teilen des Systems verursachen können. Auf diese Weise sind Denial-of-Service Angriffe möglich, als auch Fault-Angriffe um geheime Schlüsselinformationen aus dem System zu extrahieren. Darüber hinaus werden passive Angriffe gezeigt, die indirekt die Spannungsschwankungen auf dem Chip messen. Diese Messungen reichen aus, um geheime Schlüsselinformationen durch Power Analysis Seitenkanalangriffe zu extrahieren. In einer weiteren Eskalationsstufe können sich diese Angriffe auch auf andere Chips auswirken die an dasselbe Netzteil angeschlossen sind wie der FPGA. Um zu beweisen, dass vergleichbare Angriffe nicht nur innerhalb FPGAs möglich sind, wird gezeigt, dass auch kleine IoT-Geräte anfällig für Angriffe sind welche die gemeinsame Spannungsversorgung innerhalb eines Chips ausnutzen. Insgesamt zeigt diese Arbeit, dass grundlegende physikalische Variationen in integrierten Schaltkreisen die Sicherheit eines gesamten Systems untergraben können, selbst wenn der Angreifer keinen direkten Zugriff auf das Gerät hat. Für FPGAs in ihrer aktuellen Form müssen diese Probleme zuerst gelöst werden, bevor man sie mit mehreren Benutzern oder mit Zugriff von Drittanbietern sicher verwenden kann. In Veröffentlichungen die nicht Teil dieser Arbeit sind wurden bereits einige erste Gegenmaßnahmen untersucht

    Speed reading in the dark : Accelerating functional encryption for quadratic functions with reprogrammable hardware

    Get PDF
    Functional encryption is a new paradigm for encryption where decryption does not give the entire plaintext but only some function of it. Functional encryption has great potential in privacy-enhancing technologies but suffers from excessive computational overheads. We introduce the first hardware accelerator that supports functional encryption for quadratic functions. Our accelerator is implemented on a reprogrammable system-on-chip following the hardware/software codesign methogol-ogy. We benchmark our implementation for two privacy-preserving machine learning applications: (1) classification of handwritten digits from the MNIST database and (2) classification of clothes images from the Fashion MNIST database. In both cases, classification is performed with encrypted images. We show that our implementation offers speedups of over 200 times compared to a published software implementation and permits applications which are unfeasible with software-only solutions.Peer reviewe

    Speed reading in the dark : Accelerating functional encryption for quadratic functions with reprogrammable hardware

    Get PDF
    Functional encryption is a new paradigm for encryption where decryption does not give the entire plaintext but only some function of it. Functional encryption has great potential in privacy-enhancing technologies but suffers from excessive computational overheads. We introduce the first hardware accelerator that supports functional encryption for quadratic functions. Our accelerator is implemented on a reprogrammable system-on-chip following the hardware/software codesign methogol-ogy. We benchmark our implementation for two privacy-preserving machine learning applications: (1) classification of handwritten digits from the MNIST database and (2) classification of clothes images from the Fashion MNIST database. In both cases, classification is performed with encrypted images. We show that our implementation offers speedups of over 200 times compared to a published software implementation and permits applications which are unfeasible with software-only solutions.Peer reviewe

    A Programmable SoC-Based Accelerator for Privacy-Enhancing Technologies and Functional Encryption

    Get PDF
    A multitude of privacy-enhancing technologies (PETs) has been presented recently to solve the privacy problems of contemporary services utilizing cloud computing. Many of them are based on additively homomorphic encryption (AHE) that allows the computation of additions on encrypted data. The main technical obstacles for adaptation of PETs in practical systems are related to performance overheads compared with current privacy-violating alternatives. In this article, we present a hardware/software (HW/SW) codesign for programmable systems-on-chip (SoCs) that is designed for accelerating applications based on the Paillier encryption. Our implementation is a microcode-based multicore architecture that is suitable for accelerating various PETs using AHE with large integer modular arithmetic. We instantiate the implementation in a Xilinx Zynq-7000 programmable SoC and provide performance evaluations in real hardware. We also investigate its efficiency in a high-end Xilinx UltraScale+ programmable SoC. We evaluate the implementation with two target use cases that have relevance in PETs: privacy-preserving computation of squared Euclidean distances over encrypted data and multi-input functional encryption (FE) for inner products. Both of them represent the first hardware acceleration results for such operations, and in particular, the latter one is among the very first published implementation results of FE on any platform.Peer reviewe

    Enhancing an Embedded Processor Core with a Cryptographic Unit for Performance and Security

    Get PDF
    We present a set of low-cost architectural enhancements to accelerate the execution of certain arithmetic operations common in cryptographic applications on an extensible embedded processor core. The proposed enhancements are generic in the sense that they can be beneficially applied in almost any RISC processor. We implemented the enhancements in form of a cryptographic unit (CU) that offers the programmer an extended instruction set. The CU features a 128-bit wide register file and datapath, which enables it to process 128-bit words and perform 128-bit loads/stores. We analyze the speed-up factors for some arithmetic operations and public-key cryptographic algorithms obtained through these enhancements. In addition, we evaluate the hardware overhead (i.e. silicon area) of integrating the CU into an embedded RISC processor. Our experimental results show that the proposed architectural enhancements allow for a significant performance gain for both RSA and ECC at the expense of an acceptable increase in silicon area. We also demonstrate that the proposed enhancements facilitate the protection of cryptographic algorithms against certain types of side-channel attacks and present an AES implementation hardened against cache-based attacks as a case study

    Fast, compact and secure implementation of rsa on dedicated hardware

    Get PDF
    RSA is the most popular Public Key Cryptosystem (PKC) and is heavily used today. PKC comes into play, when two parties, who have previously never met, want to create a secure channel between them. The core operation in RSA is modular multiplication, which requires lots of computational power especially when the operands are longer than 1024-bits. Although today’s powerful PC’s can easily handle one RSA operation in a fraction of a second, small devices such as PDA’s, cell phones, smart cards, etc. have limited computational power, thus there is a need for dedicated hardware which is specially designed to meet the demand of this heavy calculation. Additionally, web servers, which thousands of users can access at the same time, need to perform many PKC operations in a very short time and this can create a performance bottleneck. Special algorithms implemented on dedicated hardware can take advantage of true massive parallelism and high utilization of the data path resulting in high efficiency in terms of both power and execution time while keeping the chip cost low. We will use the “Montgomery Modular Multiplication” algorithm in our implementation, which is considered one of the most efficient multiplication schemes, and has many applications in PKC. In the first part of the thesis, our “2048-bit Radix-4 based Modular Multiplier” design is introduced and compared with the conventional radix-2 modular multipliers of previous works. Our implementation for 2048-bit modular multiplication features up to 82% shorter execution time with 33% increase in the area over the conventional radix-2 designs and can achieve 132 MHz on a Xilinx xc2v6000 FPGA. The proposed multiplier has one of the fastest execution times in terms of latency and performs better than (37% better) our reference radix-2 design in terms of time-area product. The results are similar in the ASIC case where we implement our design for UMC 0.18 μm technology. In the second part, a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit intended for inexpensive FPGAs are presented. The design utilizes hardwired block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as available resources, precision and timing constraints. The deployed method is based on the Montgomery modular multiplication algorithm and the architecture utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our design completes 1020-bit and 2040-bit modular multiplications* in 7.62 μs and 27.0 μs respectively with approximately the same device usage on Xilinx Spartan-3E 500. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine easily fits into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks with an insignificant overhead in area and execution time. The upper limit on the operand precision is dictated only by the available Block-RAM to accommodate the operands within the FPGA. This design is also compared to the first one, considering the relative advantages and disadvantages of each circuit

    Reducing the Depth of Quantum FLT-Based Inversion Circuit

    Get PDF
    Works on quantum computing and cryptanalysis has increased significantly in the past few years. Various constructions of quantum arithmetic circuits, as one of the essential components in the field, has also been proposed. However, there has only been a few studies on finite field inversion despite its essential use in realizing quantum algorithms, such as in Shor's algorithm for Elliptic Curve Discrete Logarith Problem (ECDLP). In this study, we propose to reduce the depth of the existing quantum Fermat's Little Theorem (FLT)-based inversion circuit for binary finite field. In particular, we propose follow a complete waterfall approach to translate the Itoh-Tsujii's variant of FLT to the corresponding quantum circuit and remove the inverse squaring operations employed in the previous work by Banegas et al., lowering the number of CNOT gates (CNOT count), which contributes to reduced overall depth and gate count. Furthermore, compare the cost by firstly constructing our method and previous work's in Qiskit quantum computer simulator and perform the resource analysis. Our approach can serve as an alternative for a time-efficient implementation.Comment: version 0.
    corecore