147 research outputs found

    An FPGA Technologies Area Examination of the SHA-3 Hash Candidate Implementations

    Get PDF
    This paper presents an examination of the different FPGA architectures used to implement the various hash function candidates for the currently ongoing NIST-organised SHA-3 competition~\cite{Sha3NIST}. This paper is meant to be used as both a quick reference guide used in conjunction with the results table~\cite{Sha3zoo} as an aid in finding the ”best-fit” FPGA for a particular algorithm, as well as a helpful guide for explaining the many different terms and measurement units used in the various FPGA packages

    Comparative Study of Keccak SHA-3 Implementations

    Get PDF
    This paper conducts an extensive comparative study of state-of-the-art solutions for im- plementing the SHA-3 hash function. SHA-3, a pivotal component in modern cryptography, has spawned numerous implementations across diverse platforms and technologies. This research aims to provide valuable insights into selecting and optimizing Keccak SHA-3 implementations. Our study encompasses an in-depth analysis of hardware, software, and software–hardware (hybrid) solutions. We assess the strengths, weaknesses, and performance metrics of each approach. Critical factors, including computational efficiency, scalability, and flexibility, are evaluated across differ- ent use cases. We investigate how each implementation performs in terms of speed and resource utilization. This research aims to improve the knowledge of cryptographic systems, aiding in the informed design and deployment of efficient cryptographic solutions. By providing a comprehensive overview of SHA-3 implementations, this study offers a clear understanding of the available options and equips professionals and researchers with the necessary insights to make informed decisions in their cryptographic endeavors

    Hardware design of cryptographic accelerators

    Get PDF
    With the rapid growth of the Internet and digital communications, the volume of sensitive electronic transactions being transferred and stored over and on insecure media has increased dramatically in recent years. The growing demand for cryptographic systems to secure this data, across a multitude of platforms, ranging from large servers to small mobile devices and smart cards, has necessitated research into low cost, flexible and secure solutions. As constraints on architectures such as area, speed and power become key factors in choosing a cryptosystem, methods for speeding up the development and evaluation process are necessary. This thesis investigates flexible hardware architectures for the main components of a cryptographic system. Dedicated hardware accelerators can provide significant performance improvements when compared to implementations on general purpose processors. Each of the designs proposed are analysed in terms of speed, area, power, energy and efficiency. Field Programmable Gate Arrays (FPGAs) are chosen as the development platform due to their fast development time and reconfigurable nature. Firstly, a reconfigurable architecture for performing elliptic curve point scalar multiplication on an FPGA is presented. Elliptic curve cryptography is one such method to secure data, offering similar security levels to traditional systems, such as RSA, but with smaller key sizes, translating into lower memory and bandwidth requirements. The architecture is implemented using different underlying algorithms and coordinates for dedicated Double-and-Add algorithms, twisted Edwards algorithms and SPA secure algorithms, and its power consumption and energy on an FPGA measured. Hardware implementation results for these new algorithms are compared against their software counterparts and the best choices for minimum area-time and area-energy circuits are then identified and examined for larger key and field sizes. Secondly, implementation methods for another component of a cryptographic system, namely hash functions, developed in the recently concluded SHA-3 hash competition are presented. Various designs from the three rounds of the NIST run competition are implemented on FPGA along with an interface to allow fair comparison of the different hash functions when operating in a standardised and constrained environment. Different methods of implementation for the designs and their subsequent performance is examined in terms of throughput, area and energy costs using various constraint metrics. Comparing many different implementation methods and algorithms is nontrivial. Another aim of this thesis is the development of generic interfaces used both to reduce implementation and test time and also to enable fair baseline comparisons of different algorithms when operating in a standardised and constrained environment. Finally, a hardware-software co-design cryptographic architecture is presented. This architecture is capable of supporting multiple types of cryptographic algorithms and is described through an application for performing public key cryptography, namely the Elliptic Curve Digital Signature Algorithm (ECDSA). This architecture makes use of the elliptic curve architecture and the hash functions described previously. These components, along with a random number generator, provide hardware acceleration for a Microblaze based cryptographic system. The trade-off in terms of performance for flexibility is discussed using dedicated software, and hardware-software co-design implementations of the elliptic curve point scalar multiplication block. Results are then presented in terms of the overall cryptographic system

    Application-Specific Memory Subsystems

    Get PDF
    The disparity in performance between processors and main memories has led computer architects to incorporate large cache hierarchies in modern computers. These cache hierarchies are designed to be general-purpose in that they strive to provide the best possible performance across a wide range of applications. However, such a memory subsystem does not necessarily provide the best possible performance for a particular application. Although general-purpose memory subsystems are desirable when the work-load is unknown and the memory subsystem must remain fixed, when this is not the case a custom memory subsystem may be beneficial. For example, in an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) designed to run a particular application, a custom memory subsystem optimized for that application would be desirable. In addition, when there are tunable parameters in the memory subsystem, it may make sense to change these parameters depending on the application being run. Such a situation arises today with FPGAs and, to a lesser extent, GPUs, and it is plausible that general-purpose computers will begin to support greater flexibility in the memory subsystem in the future. In this dissertation, we first show that it is possible to create application-specific memory subsystems that provide much better performance than a general-purpose memory subsystem. In addition, we show a way to discover such memory subsystems automatically using a superoptimization technique on memory address traces gathered from applications. This allows one to generate a custom memory subsystem with little effort. We next show that our memory subsystem superoptimization technique can be used to optimize for objectives other than performance. As an example, we show that it is possible to reduce the number of writes to the main memory, which can be useful for main memories with limited write durability, such as flash or Phase-Change Memory (PCM). Finally, we show how to superoptimize memory subsystems for streaming applications, which are a class of parallel applications. In particular, we show that, through the use of ScalaPipe, we can author and deploy streaming applications targeting FPGAs with superoptimized memory subsystems. ScalaPipe is a domain-specific language (DSL) embedded in the Scala programming language for generating streaming applications that can be implemented on CPUs and FPGAs. Using the ScalaPipe implementation, we are able to demonstrate actual performance improvements using the superoptimized memory subsystem with applications implemented in hardware

    Optimization of Supersingular Isogeny Cryptography for Deeply Embedded Systems

    Get PDF
    Public-key cryptography in use today can be broken by a quantum computer with sufficient resources. Microsoft Research has published an open-source library of quantum-secure supersingular isogeny (SI) algorithms including Diffie-Hellman key agreement and key encapsulation in portable C and optimized x86 and x64 implementations. For our research, we modified this library to target a deeply-embedded processor with instruction set extensions and a finite-field coprocessor originally designed to accelerate traditional elliptic curve cryptography (ECC). We observed a 6.3-7.5x improvement over a portable C implementation using instruction set extensions and a further 6.0-6.1x improvement with the addition of the coprocessor. Modification of the coprocessor to a wider datapath further increased performance 2.6-2.9x. Our results show that current traditional ECC implementations can be easily refactored to use supersingular elliptic curve arithmetic and achieve post-quantum security

    Envisioning the Future of Cyber Security in Post-Quantum Era: A Survey on PQ Standardization, Applications, Challenges and Opportunities

    Full text link
    The rise of quantum computers exposes vulnerabilities in current public key cryptographic protocols, necessitating the development of secure post-quantum (PQ) schemes. Hence, we conduct a comprehensive study on various PQ approaches, covering the constructional design, structural vulnerabilities, and offer security assessments, implementation evaluations, and a particular focus on side-channel attacks. We analyze global standardization processes, evaluate their metrics in relation to real-world applications, and primarily focus on standardized PQ schemes, selected additional signature competition candidates, and PQ-secure cutting-edge schemes beyond standardization. Finally, we present visions and potential future directions for a seamless transition to the PQ era

    Utilizing Magnetic Tunnel Junction Devices in Digital Systems

    Get PDF
    The research described in this dissertation is motivated by the desire to effectively utilize magnetic tunnel junctions (MTJs) in digital systems. We explore two aspects of this: (1) a read circuit useful for global clocking and magnetologic, and (2) hardware virtualization that utilizes the deeply-pipelined nature of magnetologic. In the first aspect, a read circuit is used to sense the state of an MTJ (low or high resistance) and produce a logic output that represents this state. With global clocking, an external magnetic field combined with on-chip MTJs is used as an alternative mechanism for distributing the clock signal across the chip. With magnetologic, logic is evaluated with MTJs that must be sensed by a read circuit and used to drive downstream logic. For these two uses, we develop a resistance-to-voltage (R2V) read circuit to sense MTJ resistance and produce a logic voltage output. We design and fabricate a prototype test chip in the 3 metal 2 poly 0.5 um process for testing the R2V read circuit and experimentally validating its correctness. Using a clocked low/high resistor pair, we show that the read circuit can correctly detect the input resistance and produce the desired square wave output. The read circuit speed is measured to operate correctly up to 48 MHz. The input node is relatively insensitive to node capacitance and can handle up to 10s of pF of capacitance without changing the bandwidth of the circuit. In the second aspect, hardware virtualization is a technique by which deeply-pipelined circuits that have feedback can be utilized. MTJs have the potential to act as state in a magnetologic circuit which may result in a deep pipeline. Streams of computation are then context switched into the hardware logic, allowing them to share hardware resources and more fully utilize the pipeline stages of the logic. While applicable to magnetologic using MTJs, virtualization is also applicable to traditional logic technologies like CMOS. Our investigation targets MTJs, FPGAs, and ASICs. We develop M/D/1 and M/G/1 queueing models of the performance of virtualized hardware with secondary memory using a fixed, hierarchical, round-robin schedule that predict average throughput, latency, and queue occupancy in the system. We develop three C-slow applications and calibrate them to a clock and resource model for FPGA and ASIC technologies. Last, using the M/G/1 model, we predict throughput, latency, and resource usage for MTJ, FPGA, and ASIC technologies. We show three design scenarios illustrating ways in which to use the model

    Implementing IPsec using the Five-layer security framework and FPGAs.

    Get PDF

    Security Applications of Formal Language Theory

    Get PDF
    We present an approach to improving the security of complex, composed systems based on formal language theory, and show how this approach leads to advances in input validation, security modeling, attack surface reduction, and ultimately, software design and programming methodology. We cite examples based on real-world security flaws in common protocols representing different classes of protocol complexity. We also introduce a formalization of an exploit development technique, the parse tree differential attack, made possible by our conception of the role of formal grammars in security. These insights make possible future advances in software auditing techniques applicable to static and dynamic binary analysis, fuzzing, and general reverse-engineering and exploit development. Our work provides a foundation for verifying critical implementation components with considerably less burden to developers than is offered by the current state of the art. It additionally offers a rich basis for further exploration in the areas of offensive analysis and, conversely, automated defense tools and techniques. This report is divided into two parts. In Part I we address the formalisms and their applications; in Part II we discuss the general implications and recommendations for protocol and software design that follow from our formal analysis
    • …
    corecore