109 research outputs found

    A Compact FPGA Implementation of the SHA-3 Candidate ECHO

    Get PDF
    We propose a compact architecture of the SHA-3 candidate ECHO for the Virtex-5 FPGA family. Our architecture is built around a 8-bit datapath. We show that a careful organization of the chaining variable and the message block in the register file allows one to design a compact control unit based on a 4-bit counter, an 8-bit counter, and a simple Finite State Machine. A fully autonomous implementation of ECHO on a Xilinx Virtex-5 FPGA requires 127127 slices and a single memory block to store the internal state, and achieves a throughput of 7272Mbps

    A Low-Area Unified Hardware Architecture for the AES and the Cryptographic Hash Function ECHO

    Get PDF
    We propose a compact coprocessor for the AES (encryption, decryption, and key expansion) and the cryptographic hash function ECHO on Virtex-55 and Virtex-66 FPGAs. Our architecture is built around a 88-bit datapath. The Arithmetic and Logic Unit performs a single instruction that allows for implementing AES encryption, AES decryption, AES key expansion, and ECHO at all levels of security. Thanks to a careful organization of AES and ECHO internal states in the register file, we manage to generate all read and write addresses by means of a modulo-1616 counter and a modulo-256256 counter. A fully autonomous implementation of ECHO and AES on a Virtex-55 FPGA requires 193193 slices and a single 3636k memory block, and achieves competitive throughputs. Assuming that the security guarantees of ECHO are at least as good as the ones of the SHA-33 finalists BLAKE and Keccak, our results show that ECHO is a better candidate for low-area cryptographic coprocessors. Furthermore, the design strategy described in this work can be applied to combine the AES and the SHA-33 finalist {G}røstl

    Compact Implementations of BLAKE-32 and BLAKE-64 on FPGA

    Get PDF
    We propose compact architectures of the SHA-33 candidates BLAKE-32 and BLAKE-64 for several FPGA families. We harness the intrinsic parallelism of the algorithm to interleave the computation of four instances of the GiG_i function. This approach allows us to design an Arithmetic and Logic Unit with four pipeline stages and to achieve high clock frequencies. With careful scheduling, we completely avoid pipeline bubbles. For the time being, the designs presented in this work are the most compact ones for any of the SHA-3 candidates. We show for instance that a fully autonomous implementation of BLAKE-32 on a Xilinx Virtex-5 device requires 56 slices and two memory blocks

    A Low-Area Unified Hardware Architecture for the AES and the Cryptographic Hash Function Grøstl

    Get PDF
    This article describes the design of an 8-bit coprocessor for the AES (encryption, decryption, and key expansion) and the cryptographic hash function Grøstl on several Xilinx FPGAs. Our Arithmetic and Logic Unit performs a single instruction that allows for implementing AES encryption, AES decryption, AES key expansion, and Grøstl at all levels of security. Thanks to a careful organization of AES and Grøstl internal states in the register file, we manage to generate all read and write addresses by means of a modulo-128 counter and a modulo-256 counter. A fully autonomous implementation of Grøstl and AES on a Virtex-6 FPGA requires 169 slices and a single 36k memory block, and achieves a competitive throughput. Assuming that the security guarantees of Grøstl are at least as good as the ones of the other SHA-3 finalists, our results show that Grøstl is the best candidate for low-area cryptographic coprocessors

    On FPGA-based implementations of Gr\{o}stl

    Get PDF
    The National Institute of Standards and Technology (NIST) has started a competition for a new secure hash standard. To make a significant comparison between the submitted candidates, third party implementations of all proposed hash functions are needed. This is one of the reasons why the SHA-3 candidate Gr\{o}stl has been chosen for a FPGA-based implementation. Mainly our work is motivated by actual and future developments of the automotive market (e.g. car-2-car communication systems), which will increase the necessity for a suitable cryptographic infrastructure in modern vehicles (cf. AUTOSAR project) even further. One core component of such an infrastructure is a secure cryptographic hash function, which is used for a lot of applications like challenge-response authentication systems or digital signature schemes. Another motivation to evaluate Gr\{o}stl is its resemblance to AES. The automotive market demands, like any mass market, low budget and therefore compact implementations, hence our evaluation of Gr\{o}stl focuses on area optimizations. It is shown that, while Gr\{o}stl is inherently quite large compared to AES, it is still possible to implement the Gr\{o}stl algorithm on small and low budget FPGAs like the second smallest available Spartan-3, while maintaining a reasonable high throughput

    Performance analysis of a scalable hardware FPGA Skein implementation

    Get PDF
    Hashing functions are a key cryptographic primitive used in many everyday applications, such as authentication, ensuring data integrity, as well as digital signatures. The current hashing standard is defined by the National Institute of Standards and Technology (NIST) as the Secure Hash Standard (SHS), and includes SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512 . SHS\u27s level of security is waning as technology and analysis techniques continue to develop over time. As a result, after the 2005 Cryptographic Hash Workshop, NIST called for the creation of a new cryptographic hash algorithm to replace SHS. The new candidate algorithms were submitted on October 31st, 2008, and of them fourteen have advanced to round two of the competition. The competition is expected to produce a final replacement for the SHS standard by 2012. Multi-core processors, and parallel programming are the dominant force in computing, and some of the new hashing algorithms are attempting to take advantage of these resources by offering parallel tree-hashing variants to the algorithms. Tree-hashing allows multiple parts of the data on the same level of a tree to be operated on simultaneously, resulting in the potential to reduce the execution time complexity for hashing from O(n) to O(log n). Designs for tree-hashing require that the scalability and parallelism of the algorithms be researched on all platforms, including multi-core processors (CPUs), graphics processors (GPUs), as well as custom hardware (ASICs and FPGAs). Skein, the hashing function that this work has focused on, offers a tree-hashing mode with different options for the maximum tree height, and leaf node size, as well as the node fan-out. This research focuses on creating and analyzing the performance of scalable hardware designs for Skein\u27s tree hashing mode. Different ideas and approaches on how to modify sequential hashing cores, and create scalable control logic in order to provide for high-speed and low-area parallel hashing hardware are presented and analyzed. Equations were created to help understand the expected performance and potential bottlenecks of Skein in FPGAs. The equations are intended to assist the decision making process during the design phase, as well as potentially provide insight into design considerations for other tree hashing schemes in FPGAs. The results are also compared to current sequential designs of Skein, providing a complete analysis of the performance of Skein in an FPGA

    Compact Hardware Implementations of ChaCha, BLAKE, Threefish, and Skein on FPGA

    Get PDF
    The cryptographic hash functions BLAKE and Skein are built from the ChaCha stream cipher and the tweakable Threefish block cipher, respectively. Interestingly enough, they are based on the same arithmetic operations, and the same design philosophy allows one to design lightweight coprocessors for hashing and encryption. The key element of our approach is to take advantage of the parallelism of the algorithms to deeply pipeline our Arithmetic an Logic Units, and to avoid data dependencies by interleaving independent tasks. We show for instance that a fully autonomous implementation of BLAKE and ChaCha on a Xilinx Virtex-6 device occupies 144 slices and three memory blocks, and achieves competitive throughputs. In order to offer the same features, a coprocessor implementing Skein and Threefish requires a substantial higher slice count

    Power Analysis Attacks on Keccak

    Get PDF
    Side Channel Attacks (SCA) exploit weaknesses in implementations of cryptographic functions resulting from unintended inputs and outputs such as operation timing, electromagnetic radiation, thermal/acoustic emanations, and power consumption to break cryptographic systems with no known weaknesses in the algorithm’s mathematical structure. Power Analysis Attack (PAA) is a type of SCA that exploits the relationship between the power consumption and secret key (secret part of input to some cryptographic process) information during the cryptographic device normal operation. PAA can be further divided into three categories: Simple Power Analysis (SPA), Differential Power Analysis (DPA) and Correlation Power Analysis (CPA). PAA was first introduced in 1998 and mostly focused on symmetric-key block cipher Data Encryption Standard (DES). Most recently this technique has been applied to cryptographic hash functions. Keccak is built on sponge construction, and it provides a new Message Authentication Code (MAC) function called MAC-Keccak. The focus of this thesis is to apply the power analysis attacks that use CPA technique to extract the key from the MAC-Keccak. So far there are attacks of physical hardware implementations of MAC-Keccak using FPGA development board, but there has been no side channel vulnerability assessment of the hardware implementations using simulated power consumption waveforms. Compared to physical power extraction, circuit simulation significantly reduces the complexity of mounting a power attack, provides quicker feedback during the implementation/study of a cryptographic device, and that ultimately reduces the cost of testing and experimentation. An attack framework was developed and applied to the Keccak high speed core hardware design from the SHA-3 competition, using gate-level circuit simulation. The framework is written in a modular fashion to be flexible to attack both simulated and physical power traces of AES, MAC-Keccak, and future crypto systems. The Keccak hardware design is synthesized with the Synopsys 130-nm CMOS standard cell library. Simulated instantaneous power consumption waveforms are generated with Synopsys PrimeTime PX. 1-bit, 2-bit, 4-bit, 8-bit, and 16-bit CPA selection function key guess size attacks are performed on the waveforms to compare/analyze the optimization and computation effort/performance of successful key extraction on MAC-Keccak using 40 byte key size that fits the whole bottom plane of the 3D Keccak state. The research shows the larger the selection function key guess size used, the better the signal-noise-ratio (SNR), therefore requiring fewer numbers of traces needed to be applied to retrieve the key but suffer from higher computation effort time. Compared to larger selection function key guess size, smaller key guess size has lower SNR that requires higher number of applied traces for successful key extraction and utilizes less computational effort time. The research also explores and analyzes the attempted method of attacking the second plane of the 3D Keccak state where the key expands beyond 40 bytes using the successful approach against the bottom plane

    Unfolding Method for Shabal on Virtex-5 FPGAs: Concrete Results.pdf

    Get PDF
    Recent cryptanalysis on SHA-1 family has led the NIST to call for a public competition named SHA-3 Contest. Efficient implementations on various platforms are a criterion for ranking performance of all the candidates in this competition. It appears that most of the hardware architectures proposed for SHA-3 candidates are basic. In this paper, we focus on an optimized implementation of the Shabal candidate. We improve the state-of-the-art using the unfolding method. This transformation leads to unroll a part of the Shabal core. More precisely, our design can produce a throughput over 3 Gbps on Virtex-5 FPGAs, with a reasonable area usage
    • …
    corecore