155 research outputs found

    Remote Attacks on FPGA Hardware

    Get PDF
    Immer mehr Computersysteme sind weltweit miteinander verbunden und über das Internet zugänglich, was auch die Sicherheitsanforderungen an diese erhöht. Eine neuere Technologie, die zunehmend als Rechenbeschleuniger sowohl für eingebettete Systeme als auch in der Cloud verwendet wird, sind Field-Programmable Gate Arrays (FPGAs). Sie sind sehr flexible Mikrochips, die per Software konfiguriert und programmiert werden können, um beliebige digitale Schaltungen zu implementieren. Wie auch andere integrierte Schaltkreise basieren FPGAs auf modernen Halbleitertechnologien, die von Fertigungstoleranzen und verschiedenen Laufzeitschwankungen betroffen sind. Es ist bereits bekannt, dass diese Variationen die Zuverlässigkeit eines Systems beeinflussen, aber ihre Auswirkungen auf die Sicherheit wurden nicht umfassend untersucht. Diese Doktorarbeit befasst sich mit einem Querschnitt dieser Themen: Sicherheitsprobleme die dadurch entstehen wenn FPGAs von mehreren Benutzern benutzt werden, oder über das Internet zugänglich sind, in Kombination mit physikalischen Schwankungen in modernen Halbleitertechnologien. Der erste Beitrag in dieser Arbeit identifiziert transiente Spannungsschwankungen als eine der stärksten Auswirkungen auf die FPGA-Leistung und analysiert experimentell wie sich verschiedene Arbeitslasten des FPGAs darauf auswirken. In der restlichen Arbeit werden dann die Auswirkungen dieser Spannungsschwankungen auf die Sicherheit untersucht. Die Arbeit zeigt, dass verschiedene Angriffe möglich sind, von denen früher angenommen wurde, dass sie physischen Zugriff auf den Chip und die Verwendung spezieller und teurer Test- und Messgeräte erfordern. Dies zeigt, dass bekannte Isolationsmaßnahmen innerhalb FPGAs von böswilligen Benutzern umgangen werden können, um andere Benutzer im selben FPGA oder sogar das gesamte System anzugreifen. Unter Verwendung von Schaltkreisen zur Beeinflussung der Spannung innerhalb eines FPGAs zeigt diese Arbeit aktive Angriffe, die Fehler (Faults) in anderen Teilen des Systems verursachen können. Auf diese Weise sind Denial-of-Service Angriffe möglich, als auch Fault-Angriffe um geheime Schlüsselinformationen aus dem System zu extrahieren. Darüber hinaus werden passive Angriffe gezeigt, die indirekt die Spannungsschwankungen auf dem Chip messen. Diese Messungen reichen aus, um geheime Schlüsselinformationen durch Power Analysis Seitenkanalangriffe zu extrahieren. In einer weiteren Eskalationsstufe können sich diese Angriffe auch auf andere Chips auswirken die an dasselbe Netzteil angeschlossen sind wie der FPGA. Um zu beweisen, dass vergleichbare Angriffe nicht nur innerhalb FPGAs möglich sind, wird gezeigt, dass auch kleine IoT-Geräte anfällig für Angriffe sind welche die gemeinsame Spannungsversorgung innerhalb eines Chips ausnutzen. Insgesamt zeigt diese Arbeit, dass grundlegende physikalische Variationen in integrierten Schaltkreisen die Sicherheit eines gesamten Systems untergraben können, selbst wenn der Angreifer keinen direkten Zugriff auf das Gerät hat. Für FPGAs in ihrer aktuellen Form müssen diese Probleme zuerst gelöst werden, bevor man sie mit mehreren Benutzern oder mit Zugriff von Drittanbietern sicher verwenden kann. In Veröffentlichungen die nicht Teil dieser Arbeit sind wurden bereits einige erste Gegenmaßnahmen untersucht

    Utilizing the Double-Precision Floating-Point Computing Power of GPUs for RSA Acceleration

    Get PDF
    Asymmetric cryptographic algorithm (e.g., RSA and Elliptic Curve Cryptography) implementations on Graphics Processing Units (GPUs) have been researched for over a decade. The basic idea of most previous contributions is exploiting the highly parallel GPU architecture and porting the integer-based algorithms from general-purpose CPUs to GPUs, to offer high performance. However, the great potential cryptographic computing power of GPUs, especially by the more powerful floating-point instructions, has not been comprehensively investigated in fact. In this paper, we fully exploit the floating-point computing power of GPUs, by various designs, including the floating-point-based Montgomery multiplication/exponentiation algorithm and Chinese Remainder Theorem (CRT) implementation in GPU. And for practical usage of the proposed algorithm, a new method is performed to convert the input/output between octet strings and floating-point numbers, fully utilizing GPUs and further promoting the overall performance by about 5%. The performance of RSA-2048/3072/4096 decryption on NVIDIA GeForce GTX TITAN reaches 42,211/12,151/5,790 operations per second, respectively, which achieves 13 times the performance of the previous fastest floating-point-based implementation (published in Eurocrypt 2009). The RSA-4096 decryption precedes the existing fastest integer-based result by 23%

    XDIVINSA: eXtended DIVersifying INStruction Agent to Mitigate Power Side-Channel Leakage

    Get PDF
    Side-channel analysis (SCA) attacks pose a major threat to embedded systems due to their ease of accessibility. Realising SCA resilient cryptographic algorithms on embedded systems under tight intrinsic constraints, such as low area cost, limited computational ability, etc., is extremely challenging and often not possible. We propose a seamless and effective approach to realise a generic countermeasure against SCA attacks. XDIVINSA, an extended diversifying instruction agent, is introduced to realise the countermeasure at the microarchitecture level based on the combining concept of diversified instruction set extension (ISE) and hardware diversification. XDIVINSA is developed as a lightweight co-processor that is tightly coupled with a RISC-V processor. The proposed method can be applied to various algorithms without the need for software developers to undertake substantial design efforts hardening their implementations against SCA. XDIVINSA has been implemented on the SASEBO G-III board which hosts a Kintex-7 XC7K160T FPGA device for SCA mitigation evaluation. Experimental results based on non-specific t-statistic tests show that our solution can achieve leakage mitigation on the power side channel of different cryptographic kernels, i.e., Speck, ChaCha20, AES, and RSA with an acceptable performance overhead compared to existing countermeasures.This work has been supported in part by EPSRC via grant EP/R012288/1, under the RISE (http://www.ukrise.org) programme.Peer ReviewedPostprint (author's final draft

    The Design Space of Ultra-low Energy Asymmetric Cryptography

    Get PDF
    The energy cost of asymmetric cryptography, a vital component of modern secure communications, inhibits its wide spread adoption within the ultra-low energy regimes such as Implantable Medical Devices (IMDs), Wireless Sensor Networks (WSNs), and Radio Frequency Identification tags (RFIDs). In literature, a plethora of hardware and software acceleration techniques exists for improving the performance of asymmetric cryptography. However, very little attention has been focused on the energy efficiency. Therefore, in this dissertation, I explore the design space thoroughly, evaluating proposed hardware acceleration techniques in terms of energy cost and showing how effective they are at reducing the energy per cryptographic operation. To do so, I estimate the energy consumption for six different hardware/software configurations across five levels of security, including both GF(p) and GF(2^m) computation. First, we design and evaluate an efficient baseline architecture for pure software-based cryptography, which is centered around a pipelined RISC processor with 256KB of program ROM and 16KB of RAM. Then, we augment our processor design with simple, yet beneficial instruction set extensions for GF(p) computation and evaluate the improvement in terms of energy per cryptographic operation compared to the baseline microarchitecture. While examining the energy breakdown of the system, it became clear that fetching instructions from program memory was contributing significantly to the overall energy consumption. Thus, we implement a parameterizable instruction cache and simulate various configurations. We determine that for our working set, the energy-optimal instruction cache is 4KB, providing a 25% energy improvement over the baseline architecture for a 192-bit key-size. Next, we introduce a reconfigurable GF(p) accelerator to our microarchitecture and mea sure the energy per operation against the baseline and the ISA extensions. For ISA extensions, we show between 1.32 and 1.45 factor improvement in energy efficiency over baseline, while for full acceleration we demonstrate a 5.17 to 6.34 factor improvement. Continuing towards greater efficiency, we investigate the energy efficiency of different arithmetic by first adding GF(2^m) instruction set extensions to our processor architecture and comparing them to their GF(p) counterpart. Finally, we design a non-configurable 163-bit GF(2^m) accelerator and perform some initial energy estimates, comparing them with our prior work. In the end, we discuss our ongoing research and make suggestions for future work. The work presented here, along with proposed future work, will aid in bringing asymmetric cryptography within reach of ultra-low energy devices

    Fast, compact and secure implementation of rsa on dedicated hardware

    Get PDF
    RSA is the most popular Public Key Cryptosystem (PKC) and is heavily used today. PKC comes into play, when two parties, who have previously never met, want to create a secure channel between them. The core operation in RSA is modular multiplication, which requires lots of computational power especially when the operands are longer than 1024-bits. Although today’s powerful PC’s can easily handle one RSA operation in a fraction of a second, small devices such as PDA’s, cell phones, smart cards, etc. have limited computational power, thus there is a need for dedicated hardware which is specially designed to meet the demand of this heavy calculation. Additionally, web servers, which thousands of users can access at the same time, need to perform many PKC operations in a very short time and this can create a performance bottleneck. Special algorithms implemented on dedicated hardware can take advantage of true massive parallelism and high utilization of the data path resulting in high efficiency in terms of both power and execution time while keeping the chip cost low. We will use the “Montgomery Modular Multiplication” algorithm in our implementation, which is considered one of the most efficient multiplication schemes, and has many applications in PKC. In the first part of the thesis, our “2048-bit Radix-4 based Modular Multiplier” design is introduced and compared with the conventional radix-2 modular multipliers of previous works. Our implementation for 2048-bit modular multiplication features up to 82% shorter execution time with 33% increase in the area over the conventional radix-2 designs and can achieve 132 MHz on a Xilinx xc2v6000 FPGA. The proposed multiplier has one of the fastest execution times in terms of latency and performs better than (37% better) our reference radix-2 design in terms of time-area product. The results are similar in the ASIC case where we implement our design for UMC 0.18 μm technology. In the second part, a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit intended for inexpensive FPGAs are presented. The design utilizes hardwired block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as available resources, precision and timing constraints. The deployed method is based on the Montgomery modular multiplication algorithm and the architecture utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our design completes 1020-bit and 2040-bit modular multiplications* in 7.62 μs and 27.0 μs respectively with approximately the same device usage on Xilinx Spartan-3E 500. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine easily fits into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks with an insignificant overhead in area and execution time. The upper limit on the operand precision is dictated only by the available Block-RAM to accommodate the operands within the FPGA. This design is also compared to the first one, considering the relative advantages and disadvantages of each circuit

    Side-Channel Analysis and Cryptography Engineering : Getting OpenSSL Closer to Constant-Time

    Get PDF
    As side-channel attacks reached general purpose PCs and started to be more practical for attackers to exploit, OpenSSL adopted in 2005 a flagging mechanism to protect against SCA. The opt-in mechanism allows to flag secret values, such as keys, with the BN_FLG_CONSTTIME flag. Whenever a flag is checked and detected, the library changes its execution flow to SCA-secure functions that are slower but safer, protecting these secret values from being leaked. This mechanism favors performance over security, it is error-prone, and is obscure for most library developers, increasing the potential for side-channel vulnerabilities. This dissertation presents an extensive side-channel analysis of OpenSSL and criticizes its fragile flagging mechanism. This analysis reveals several flaws affecting the library resulting in multiple side-channel attacks, improved cache-timing attack techniques, and a new side channel vector. The first part of this dissertation introduces the main topic and the necessary related work, including the microarchitecture, the cache hierarchy, and attack techniques; then it presents a brief troubled history of side-channel attacks and defenses in OpenSSL, setting the stage for the related publications. This dissertation includes seven original publications contributing to the area of side-channel analysis, microarchitecture timing attacks, and applied cryptography. From an SCA perspective, the results identify several vulnerabilities and flaws enabling protocol-level attacks on RSA, DSA, and ECDSA, in addition to full SCA of the SM2 cryptosystem. With respect to microarchitecture timing attacks, the dissertation presents a new side-channel vector due to port contention in the CPU execution units. And finally, on the applied cryptography front, OpenSSL now enjoys a revamped code base securing several cryptosystems against SCA, favoring a secure-by-default protection against side-channel attacks, instead of the insecure opt-in flagging mechanism provided by the fragile BN_FLG_CONSTTIME flag

    EM Side Channel Analysis on Complex SoC architectures

    Get PDF
    The EM side channel analysis is a very effective technique to attack cryptographic systems due to its non invasive nature and capability to launch an attack even with limited resources. The EM leakage from devices can give information about computations on the processor, which can in turn reveal the internal state of the algorithm. For security sensitive algorithms, these EM radiations can be exploited by the adversary to extract secret key dependent operations hence EM side channel must be studied for evaluating the security of these algorithms. Modern embedded devices composed of System-on-Chip architectures are considered hard targets for EM side channel analysis mainly due to their complex architecture. This thesis explores the viability of EM side channel attacks on such targets. There is a comprehensive literature overview of EM side channel analysis followed by a practical side channel attack on a SoC device using well know cryptographic library OpenSSL. The attack successfully extracts the secret key dependent operation which can be used to retrieve the private key in security protocols such as TLS and SSH. The thesis concludes, with practical single trace attacks, that cryptographic implementations can still be broken using EM side channel analysis, and a complex nature of the device have no significant effect when combined with signal processing methods for extracting side channel information, hence the cryptographic software implementations must address these issues

    Performance Comparison of Projective Elliptic-curve Point Multiplication in 64-bit x86 Runtime Environment

    Get PDF
    For over two decades, mathematicians and cryptologists have evaluated and presented the theoretical performance of Elliptic-curve scalar point-multiplication in projective geometry. Because computation in projective domain is composed of a wide array of formulations and computing optimizations, there is not a comprehensive performance comparison of point-multiplication using projective transformation available to verify its realistic efficiency in 64-bit x86 computing platforms. Today, research on explicit mathematical formulations in projective domain continues to excel by seeking higher computational efficiency and ease of realization. An explicit performance evaluation will help implementers choose better implementation methods and improve Elliptic-curve scalar point-multiplication. This paper was founded on the practical solution that obtaining realistic performance figures should be based on more precise computational cost metrics and specific computing platforms. As part of that solution, an empirical performance benchmark comparison between two approaches implementing projective Elliptic-curve scalar point-multiplication will be presented to provide the selection of, and subsequently ways to improve scalar point-multiplication technology executing in a 64-bit x86 runtime environment

    Evaluating Information Leakage by Quantitative and Interpretable Measurements

    Get PDF
    Noninterference, a strong security property for a computation process, informally says that the process output is insensitive to the value of its secret inputs – the secret inputs do not "interfere" with those outputs. This is too strong, however; a degree of interference is necessary in almost all real systems. In this dissertation, we propose a measure of noninterference that is more practical. Based on a model of computations with three types of input (secret, attacker-controlled, and others) and an attacker-observable output, we define a noninterference measure that can assess and explain information leaks in actual codebases. We start with assessing a new defense against cache-based side-channel attacks in a cloud environment, using an experiment-based quantitative measure of leakage against existing attacks. It is not enough to measure leakage through empirical analysis, however, as it fails to identify new interference introduced by a weak defense design. We propose a symbolic execution framework to formally measure interference in simple software procedures, encompassing any interference from a set of secret inputs to observable outputs. Leveraging approximate model counting techniques, we make this framework scalable with parallelization. Unfortunately, this technique does not scale to support analysis of hardware processor designs, in part due to its reliance on symbolic execution to create a logical postcondition of the computation. We thus modified the framework to sidestep symbolic execution when analyzing processor designs. To further tame the complexity due to various sources of interference, we extend our framework to remove, or declassify, certain interference from consideration, so that the framework instead highlights other forms of interference, and to provide human-interpretable rules that explain the conditions under which interference occurs. We demonstrate the practicality of the work through case studies of both software-based leakage and vulnerabilities in the RISC-V BOOM core with different configurations.Doctor of Philosoph
    corecore