79 research outputs found

    Side-Channel Analysis Protection and Low-Latency in Action - case study of PRINCE and Midori

    Get PDF
    During the last years, the industry sector showed particular interest in solutions which allow to encrypt and decrypt data within one clock cycle. Known as low-latency cryptography, such ciphers are desirable for pervasive applications with real-time security requirements. On the other hand, pervasive applications are very likely in control of the end user, and may operate in a hostile environment. Hence, in such scenarios it is necessary to provide security against side-channel analysis (SCA) attacks while still keeping the low-latency feature. Since the single-clock-cycle concept requires an implementation in a fully-unrolled fashion, the application of masking schemes - as the most widely studied countermeasure - is not straightforward. The contribution of this work is to present and discuss about the difficulties and challenges that hardware engineers face when integrating SCA countermeasures into low-latency constructions. In addition to several design architectures, practical evaluations, and discussions about the problems and potential solutions with respect to the case study PRINCE (also compared with Midori), the final message of this paper is a couple of suggestions for future low-latency designs to - hopefully - ease the integration of SCA countermeasures

    Efficient Error detection Architectures for Low-Energy Block Ciphers with the Case Study of Midori Benchmarked on FPGA

    Get PDF
    Achieving secure, high performance implementations for constrained applications such as implantable and wearable medical devices is a priority in efficient block ciphers. However, security of these algorithms is not guaranteed in presence of malicious and natural faults. Recently, a new lightweight block cipher, Midori, has been proposed which optimizes the energy consumption besides having low latency and hardware complexity. This algorithm is proposed in two energy-efficient varients, i.e., Midori64 and Midori128, with block sizes equal to 64 and 128 bits. In this thesis, fault diagnosis schemes for variants of Midori are proposed. To the best of the our knowledge, there has been no fault diagnosis scheme presented in the literature for Midori to date. The fault diagnosis schemes are provided for the nonlinear S-box layer and for the round structures with both 64-bit and 128-bit Midori symmetric key ciphers. The proposed schemes are benchmarked on field-programmable gate array (FPGA) and their error coverage is assessed with fault-injection simulations. These proposed error detection architectures make the implementations of this new low-energy lightweight block cipher more reliable

    Low-Latency and Low-Randomness Second-Order Masked Cubic Functions

    Get PDF
    Masking schemes are the most popular countermeasure to mitigate Side-Channel Analysis (SCA) attacks. Compared to software, their hardware implementations require certain considerations with respect to physical defaults, such as glitches. To counter this extended leakage effect, the technique known as Threshold Implementation (TI) has proven to be a reliable solution. However, its efficiency, namely the number of shares, is tied to the algebraic degree of the target function. As a result, the application of TI may lead to unaffordable implementation costs. This dependency is relaxed by the successor schemes where the minimum number of d + 1 shares suffice for dth-order protection independent of the function’s algebraic degree. By this, although the number of input shares is reduced, the implementation costs are not necessarily low due to their high demand for fresh randomness. It becomes even more challenging when a joint low-latency and low-randomness cost is desired. In this work, we provide a methodology to realize the second-order glitch-extended probing-secure implementation of cubic functions with three shares while allowing to reuse fresh randomness. This enables us to construct low-latency second-order secure implementations of several popular lightweight block ciphers, including Skinny, Midori, and Prince, with a very limited number of fresh masks. Notably, compared to state-of-the-art equivalent implementations, our designs lower the latency in terms of the number of clock cycles while keeping randomness costs low

    CRAFT: Lightweight Tweakable Block Cipher with Efficient Protection Against DFA Attacks

    Get PDF
    Traditionally, countermeasures against physical attacks are integrated into the implementation of cryptographic primitives after the algorithms have been designed for achieving a certain level of cryptanalytic security. This picture has been changed by the introduction of PICARO, ZORRO, and FIDES, where efficient protection against Side-Channel Analysis (SCA) attacks has been considered in their design. In this work we present the tweakable block cipher CRAFT: the efficient protection of its implementations against Differential Fault Analysis (DFA) attacks has been one of the main design criteria, while we provide strong bounds for its security in the related-tweak model. Considering the area footprint of round-based hardware implementations, CRAFT outperforms the other lightweight ciphers with the same state and key size. This holds not only for unprotected implementations but also when fault-detection facilities, side-channel protection, and their combination are integrated into the implementation. In addition to supporting a 64-bit tweak, CRAFT has the additional property that the circuit realizing the encryption can support the decryption functionality as well with very little area overhead

    CRAFT: Lightweight Tweakable Block Cipher with Efficient Protection Against DFA Attacks

    Get PDF
    Traditionally, countermeasures against physical attacks are integrated into the implementation of cryptographic primitives after the algorithms have been designed for achieving a certain level of cryptanalytic security. This picture has been changed by the introduction of PICARO, ZORRO, and FIDES, where efficient protection against Side-Channel Analysis (SCA) attacks has been considered in their design. In this work we present the tweakable block cipher CRAFT: the efficient protection of its implementations against Differential Fault Analysis (DFA) attacks has been one of the main design criteria, while we provide strong bounds for its security in the related-tweak model. Considering the area footprint of round-based hardware implementations, CRAFT outperforms the other lightweight ciphers with the same state and key size. This holds not only for unprotected implementations but also when fault-detection facilities, side-channel protection, and their combination are integrated into the implementation. In addition to supporting a 64-bit tweak, CRAFT has the additional property that the circuit realizing the encryption can support the decryption functionality as well with very little area overhead

    Optimized Threshold Implementations: Securing Cryptographic Accelerators for Low-Energy and Low-Latency Applications

    Get PDF
    Threshold implementations have emerged as one of the most popular masking countermeasures for hardware implementations of cryptographic primitives. In the original version of TI, the number of input shares was dependent on both security order dd and algebraic degree of a function tt, namely td+1td + 1. At CRYPTO 2015, a new method was presented yielding to a dd-th order secure implementation using d+1d+1 input shares. In this work, we first provide a construction for d+1d+1 TI sharing which achieves the minimal number of output shares for any nn-input Boolean function of degree t=n1t=n-1. Furthermore, we present a heuristic for minimizing the number of output shares for higher order td+1td + 1 TI. Finally, we demonstrate the applicability of our results on d+1d+1 and td+1td+1 TI versions, for first- and second-order secure, low-latency and low-energy implementations of the PRINCE block cipher

    Design and analysis of cryptographic algorithms

    Get PDF

    Riding the Waves Towards Generic Single-Cycle Masking in Hardware

    Get PDF
    Research on the design of masked cryptographic hardware circuits in the past has mostly focused on reducing area and randomness requirements. However, many embedded devices like smart cards and IoT nodes also need to meet certain performance criteria, which is why the latency of masked hardware circuits also represents an important metric for many practical applications. The root cause of latency in masked hardware circuits is the need for additional register stages that synchronize the propagation of shares. Otherwise, glitches would violate the basic assumptions of the used masking scheme. This issue can be addressed to some extent, e.g., by using lightweight cryptographic algorithms with low-degree Sboxes, however, many applications still require the usage of schemes with higher-degree S-boxes like AES. Several recent works have already proposed solutions that help reduce this latency yet they either come with noticeably increased area/randomness requirements, limitations on masking orders, or specific assumptions on the general architecture of the crypto core. In this work, we introduce a generic and efficient method for designing single-cycle glitch-resistant (higher-order) masked hardware of cryptographic S-boxes. We refer to this technique as (generic) Self-Synchronized Masking (“SESYM”). The main idea of our approach is to replace register stages with a partial dual-rail encoding of masked signals that ensures synchronization within the circuit. More concretely, we show that WDDL gates and Muller C-elements can be used in combination with standard masking schemes to design single-cycle S-box circuits that, especially in case of higher-degree S-boxes, have noticeably lower requirements in terms of area and online randomness. We apply our method to DOM-based S-boxes of Ascon and AES and compare the resulting circuits to existing latency optimized circuits based on TI, GLM, and LMDPL. The latency of all three designs is reduced to single-cycle operation and are dth-order secure. Compared to GLM-masked Ascon, our approach comes with a 6.4 times reduction in online randomness for all protection orders. Compared to 1st-order LMDPL-masked AES, our approach achieves comparable results, while it is more generic, amongst others, by also supporting higher-order designs. We also underline the practical protection of our constructions against power analysis attacks via empirical and formal verification approaches

    LLTI: Low-Latency Threshold Implementations

    Get PDF
    With the enormous increase in portable cryptographic devices, physical attacks are becoming similarly popular. One of the most common physical attacks is Side-Channel Analysis (SCA), extremely dangerous due to its non-invasive nature. Threshold Implementations (TI) was proposed as the first countermeasure to provide provable security in masked hardware implementations. While most works on hardware masking are focused on optimizing the area requirements, with the newer and smaller technologies area is taking a backseat, and low-latency is gaining importance. In this work, we revisit the scheme proposed by Arribas et al. in TCHES 2018 to secure unrolled implementations. We formalize and expand this methodology, to devise a masking scheme, derived from TI, designed to secure hardware implementations optimized for latency named Low-Latency Threshold Implementations (LLTI). By applying the distributive property and leveraging a divide-and-conquer strategy, we split a non-linear operation in layers which are masked separately. The result is a more efficient scheme than the former TI for any operation of algebraic degree greater than two, achieving great optimizations both in terms of speed and area. We compare the performance of first-order LLTI with first-order TI in securing a cubic gate and a degree-7 AND gate without using any registers in between. We achieve a 137% increase in maximum frequency and a 60% reduction in area for the cubic gate, and 3131 times reduction in area in the case of a degree-7 AND gate compared to TI. To further illustrate the power of our scheme we take a low-latency PRINCE implementation from the literature and, by simply changing the secure S-box with the LLTI version, we achieve a 46% max. frequency improvement and a 38% area reduction. Moreover, we apply LLTI to a secure a low-latency AES implementation and compare it with the TI version, achieving a 6.9 times max. freq. increase and a 47.2% area reduction

    The QARMA Block Cipher Family. Almost MDS Matrices Over Rings With Zero Divisors, Nearly Symmetric Even-Mansour Constructions With Non-Involutory Central Rounds, and Search Heuristics for Low-Latency S-Boxes

    Get PDF
    This paper introduces QARMA, a new family of lightweight tweakable block ciphers targeted at applications such as memory encryption, the generation of very short tags for hardware-assisted prevention of software exploitation, and the construction of keyed hash functions. QARMA is inspired by reflection ciphers such as PRINCE, to which it adds a tweaking input, and MANTIS. However, QARMA differs from previous reflector constructions in that it is a three-round Even-Mansour scheme instead of a FX-construction, and its middle permutation is non-involutory and keyed. We introduce and analyse a family of Almost MDS matrices defined over a ring with zero divisors that allows us to encode rotations in its operation while maintaining the minimal latency associated to {0, 1}-matrices. The purpose of all these design choices is to harden the cipher against various classes of attacks. We also describe new S-Box search heuristics aimed at minimising the critical path. QARMA exists in 64- and 128-bit block sizes, where block and tweak size are equal, and keys are twice as long as the blocks. We argue that QARMA provides sufficient security margins within the constraints determined by the mentioned applications, while still achieving best-in-class latency. Implementation results on a state-of-the art manufacturing process are reported. Finally, we propose a technique to extend the length of the tweak by using, for instance, a universal hash function, which can also be used to strengthen the security of QARMA
    corecore