601 research outputs found

    Optimization of Supersingular Isogeny Cryptography for Deeply Embedded Systems

    Get PDF
    Public-key cryptography in use today can be broken by a quantum computer with sufficient resources. Microsoft Research has published an open-source library of quantum-secure supersingular isogeny (SI) algorithms including Diffie-Hellman key agreement and key encapsulation in portable C and optimized x86 and x64 implementations. For our research, we modified this library to target a deeply-embedded processor with instruction set extensions and a finite-field coprocessor originally designed to accelerate traditional elliptic curve cryptography (ECC). We observed a 6.3-7.5x improvement over a portable C implementation using instruction set extensions and a further 6.0-6.1x improvement with the addition of the coprocessor. Modification of the coprocessor to a wider datapath further increased performance 2.6-2.9x. Our results show that current traditional ECC implementations can be easily refactored to use supersingular elliptic curve arithmetic and achieve post-quantum security

    A practical evaluation on RSA and ECC-based cipher suites for IoT high-security energy-efficient Fog and mist computing devices

    Get PDF
    [Abstract] The latest Internet of Things (IoT) edge-centric architectures allow for unburdening higher layers from part of their computational and data processing requirements. In the specific case of fog computing systems, they reduce greatly the requirements of cloud-centric systems by processing in fog gateways part of the data generated by end devices, thus providing services that were previously offered by a remote cloud. Thanks to recent advances in System-on-Chip (SoC) energy efficiency, it is currently possible to create IoT end devices with enough computational power to process the data generated by their sensors and actuators while providing complex services, which in recent years derived into the development of the mist computing paradigm. To allow mist computing nodes to provide the previously mentioned benefits and guarantee the same level of security as in other architectures, end-to-end standard security mechanisms need to be implemented. In this paper, a high-security energy-efficient fog and mist computing architecture and a testbed are presented and evaluated. The testbed makes use of Transport Layer Security (TLS) 1.2 Elliptic Curve Cryptography (ECC) and Rivest-Shamir-Adleman (RSA) cipher suites (that comply with the yet to come TLS 1.3 standard requirements), which are evaluated and compared in terms of energy consumption and data throughput for a fog gateway and two mist end devices. The obtained results allow a conclusion that ECC outperforms RSA in both energy consumption and data throughput for all the tested security levels. Moreover, the importance of selecting a proper ECC curve is demonstrated, showing that, for the tested devices, some curves present worse energy consumption and data throughput than other curves that provide a higher security level. As a result, this article not only presents a novel mist computing testbed, but also provides guidelines for future researchers to find out efficient and secure implementations for advanced IoT devices.Xunta de Galicia; ED431C 2016-045Xunta de Galicia; ED341D R2016/012Xunta de Galicia; ED431G/01Agencia Estatal de Investigación de España; TEC2013-47141-C4-1-RAgencia Estatal de Investigación de España; TEC2015-69648-REDCAgencia Estatal de Investigación de España; TEC2016-75067-C4-1-

    Low-Resource and Fast Elliptic Curve Implementations over Binary Edwards Curves

    Get PDF
    Elliptic curve cryptography (ECC) is an ideal choice for low-resource applications because it provides the same level of security with smaller key sizes than other existing public key encryption schemes. For low-resource applications, designing efficient functional units for elliptic curve computations over binary fields results in an effective platform for an embedded co-processor. This thesis investigates co-processor designs for area-constrained devices. Particularly, we discuss an implementation utilizing state of the art binary Edwards curve equations over mixed point addition and doubling. The binary Edwards curve offers the security advantage that it is complete and is, therefore, immune to the exceptional points attack. In conjunction with Montgomery ladder, such a curve is naturally immune to most types of simple power and timing attacks. Finite field operations were performed in the small and efficient Gaussian normal basis. The recently presented formulas for mixed point addition by K. Kim, C. Lee, and C. Negre at Indocrypt 2014 were found to be invalid, but were corrected such that the speed and register usage were maintained. We utilize corrected mixed point addition and doubling formulas to achieve a secure, but still fast implementation of a point multiplication on binary Edwards curves. Our synthesis results over NIST recommended fields for ECC indicate that the proposed co-processor requires about 50% fewer clock cycles for point multiplication and occupies a similar silicon area when compared to the most recent in literature

    Efficient Elliptic Curve Cryptography Software Implementation on Embedded Platforms

    Get PDF

    Cross-core Microarchitectural Attacks and Countermeasures

    Get PDF
    In the last decade, multi-threaded systems and resource sharing have brought a number of technologies that facilitate our daily tasks in a way we never imagined. Among others, cloud computing has emerged to offer us powerful computational resources without having to physically acquire and install them, while smartphones have almost acquired the same importance desktop computers had a decade ago. This has only been possible thanks to the ever evolving performance optimization improvements made to modern microarchitectures that efficiently manage concurrent usage of hardware resources. One of the aforementioned optimizations is the usage of shared Last Level Caches (LLCs) to balance different CPU core loads and to maintain coherency between shared memory blocks utilized by different cores. The latter for instance has enabled concurrent execution of several processes in low RAM devices such as smartphones. Although efficient hardware resource sharing has become the de-facto model for several modern technologies, it also poses a major concern with respect to security. Some of the concurrently executed co-resident processes might in fact be malicious and try to take advantage of hardware proximity. New technologies usually claim to be secure by implementing sandboxing techniques and executing processes in isolated software environments, called Virtual Machines (VMs). However, the design of these isolated environments aims at preventing pure software- based attacks and usually does not consider hardware leakages. In fact, the malicious utilization of hardware resources as covert channels might have severe consequences to the privacy of the customers. Our work demonstrates that malicious customers of such technologies can utilize the LLC as the covert channel to obtain sensitive information from a co-resident victim. We show that the LLC is an attractive resource to be targeted by attackers, as it offers high resolution and, unlike previous microarchitectural attacks, does not require core-colocation. Particularly concerning are the cases in which cryptography is compromised, as it is the main component of every security solution. In this sense, the presented work does not only introduce three attack variants that can be applicable in different scenarios, but also demonstrates the ability to recover cryptographic keys (e.g. AES and RSA) and TLS session messages across VMs, bypassing sandboxing techniques. Finally, two countermeasures to prevent microarchitectural attacks in general and LLC attacks in particular from retrieving fine- grain information are presented. Unlike previously proposed countermeasures, ours do not add permanent overheads in the system but can be utilized as preemptive defenses. The first identifies leakages in cryptographic software that can potentially lead to key extraction, and thus, can be utilized by cryptographic code designers to ensure the sanity of their libraries before deployment. The second detects microarchitectural attacks embedded into innocent-looking binaries, preventing them from being posted in official application repositories that usually have the full trust of the customer

    The Design Space of Ultra-low Energy Asymmetric Cryptography

    Get PDF
    The energy cost of asymmetric cryptography, a vital component of modern secure communications, inhibits its wide spread adoption within the ultra-low energy regimes such as Implantable Medical Devices (IMDs), Wireless Sensor Networks (WSNs), and Radio Frequency Identification tags (RFIDs). In literature, a plethora of hardware and software acceleration techniques exists for improving the performance of asymmetric cryptography. However, very little attention has been focused on the energy efficiency. Therefore, in this dissertation, I explore the design space thoroughly, evaluating proposed hardware acceleration techniques in terms of energy cost and showing how effective they are at reducing the energy per cryptographic operation. To do so, I estimate the energy consumption for six different hardware/software configurations across five levels of security, including both GF(p) and GF(2^m) computation. First, we design and evaluate an efficient baseline architecture for pure software-based cryptography, which is centered around a pipelined RISC processor with 256KB of program ROM and 16KB of RAM. Then, we augment our processor design with simple, yet beneficial instruction set extensions for GF(p) computation and evaluate the improvement in terms of energy per cryptographic operation compared to the baseline microarchitecture. While examining the energy breakdown of the system, it became clear that fetching instructions from program memory was contributing significantly to the overall energy consumption. Thus, we implement a parameterizable instruction cache and simulate various configurations. We determine that for our working set, the energy-optimal instruction cache is 4KB, providing a 25% energy improvement over the baseline architecture for a 192-bit key-size. Next, we introduce a reconfigurable GF(p) accelerator to our microarchitecture and mea sure the energy per operation against the baseline and the ISA extensions. For ISA extensions, we show between 1.32 and 1.45 factor improvement in energy efficiency over baseline, while for full acceleration we demonstrate a 5.17 to 6.34 factor improvement. Continuing towards greater efficiency, we investigate the energy efficiency of different arithmetic by first adding GF(2^m) instruction set extensions to our processor architecture and comparing them to their GF(p) counterpart. Finally, we design a non-configurable 163-bit GF(2^m) accelerator and perform some initial energy estimates, comparing them with our prior work. In the end, we discuss our ongoing research and make suggestions for future work. The work presented here, along with proposed future work, will aid in bringing asymmetric cryptography within reach of ultra-low energy devices

    Reducing the Depth of Quantum FLT-Based Inversion Circuit

    Get PDF
    Works on quantum computing and cryptanalysis has increased significantly in the past few years. Various constructions of quantum arithmetic circuits, as one of the essential components in the field, has also been proposed. However, there has only been a few studies on finite field inversion despite its essential use in realizing quantum algorithms, such as in Shor's algorithm for Elliptic Curve Discrete Logarith Problem (ECDLP). In this study, we propose to reduce the depth of the existing quantum Fermat's Little Theorem (FLT)-based inversion circuit for binary finite field. In particular, we propose follow a complete waterfall approach to translate the Itoh-Tsujii's variant of FLT to the corresponding quantum circuit and remove the inverse squaring operations employed in the previous work by Banegas et al., lowering the number of CNOT gates (CNOT count), which contributes to reduced overall depth and gate count. Furthermore, compare the cost by firstly constructing our method and previous work's in Qiskit quantum computer simulator and perform the resource analysis. Our approach can serve as an alternative for a time-efficient implementation.Comment: version 0.

    Automatic generation of high speed elliptic curve cryptography code

    Get PDF
    Apparently, trust is a rare commodity when power, money or life itself are at stake. History is full of examples. Julius Caesar did not trust his generals, so that: ``If he had anything confidential to say, he wrote it in cipher, that is, by so changing the order of the letters of the alphabet, that not a word could be made out. If anyone wishes to decipher these, and get at their meaning, he must substitute the fourth letter of the alphabet, namely D, for A, and so with the others.'' And so the history of cryptography began moving its first steps. Nowadays, encryption has decayed from being an emperor's prerogative and became a daily life operation. Cryptography is pervasive, ubiquitous and, the best of all, completely transparent to the unaware user. Each time we buy something on the Internet we use it. Each time we search something on Google we use it. Everything without (almost) realizing that it silently protects our privacy and our secrets. Encryption is a very interesting instrument in the "toolbox of security" because it has very few side effects, at least on the user side. A particularly important one is the intrinsic slow down that its use imposes in the communications. High speed cryptography is very important for the Internet, where busy servers proliferate. Being faster is a double advantage: more throughput and less server overhead. In this context, however, the public key algorithms starts with a big handicap. They have very bad performances if compared to their symmetric counterparts. Due to this reason their use is often reduced to the essential operations, most notably key exchanges and digital signatures. The high speed public key cryptography challenge is a very practical topic with serious repercussions in our technocentric world. Using weak algorithms with a reduced key length to increase the performances of a system can lead to catastrophic results. In 1985, Miller and Koblitz independently proposed to use the group of rational points of an elliptic curve over a finite field to create an asymmetric algorithm. Elliptic Curve Cryptography (ECC) is based on a problem known as the ECDLP (Elliptic Curve Discrete Logarithm Problem) and offers several advantages with respect to other more traditional encryption systems such as RSA and DSA. The main benefit is that it requires smaller keys to provide the same security level since breaking the ECDLP is much harder. In addition, a good ECC implementation can be very efficient both in time and memory consumption, thus being a good candidate for performing high speed public key cryptography. Moreover, some elliptic curve based techniques are known to be extremely resilient to quantum computing attacks, such as the SIDH (Supersingular Isogeny Diffie-Hellman). Traditional elliptic curve cryptography implementations are optimized by hand taking into account the mathematical properties of the underlying algebraic structures, the target machine architecture and the compiler facilities. This process is time consuming, requires a high degree of expertise and, ultimately, error prone. This dissertation' ultimate goal is to automatize the whole optimization process of cryptographic code, with a special focus on ECC. The framework presented in this thesis is able to produce high speed cryptographic code by automatically choosing the best algorithms and applying a number of code-improving techniques inspired by the compiler theory. Its central component is a flexible and powerful compiler able to translate an algorithm written in a high level language and produce a highly optimized C code for a particular algebraic structure and hardware platform. The system is generic enough to accommodate a wide array of number theory related algorithms, however this document focuses only on optimizing primitives based on elliptic curves defined over binary fields

    Energy Efficient Hardware Design for Securing the Internet-of-Things

    Full text link
    The Internet of Things (IoT) is a rapidly growing field that holds potential to transform our everyday lives by placing tiny devices and sensors everywhere. The ubiquity and scale of IoT devices require them to be extremely energy efficient. Given the physical exposure to malicious agents, security is a critical challenge within the constrained resources. This dissertation presents energy-efficient hardware designs for IoT security. First, this dissertation presents a lightweight Advanced Encryption Standard (AES) accelerator design. By analyzing the algorithm, a novel method to manipulate two internal steps to eliminate storage registers and replace flip-flops with latches to save area is discovered. The proposed AES accelerator achieves state-of-art area and energy efficiency. Second, the inflexibility and high Non-Recurring Engineering (NRE) costs of Application-Specific-Integrated-Circuits (ASICs) motivate a more flexible solution. This dissertation presents a reconfigurable cryptographic processor, called Recryptor, which achieves performance and energy improvements for a wide range of security algorithms across public key/secret key cryptography and hash functions. The proposed design employs circuit techniques in-memory and near-memory computing and is more resilient to power analysis attack. In addition, a simulator for in-memory computation is proposed. It is of high cost to design and evaluate new-architecture like in-memory computing in Register-transfer level (RTL). A C-based simulator is designed to enable fast design space exploration and large workload simulations. Elliptic curve arithmetic and Galois counter mode are evaluated in this work. Lastly, an error resilient register circuit, called iRazor, is designed to tolerate unpredictable variations in manufacturing process operating temperature and voltage of VLSI systems. When integrated into an ARM processor, this adaptive approach outperforms competing industrial techniques such as frequency binning and canary circuits in performance and energy.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147546/1/zhyiqun_1.pd
    • …
    corecore