138 research outputs found

    Riding the Waves Towards Generic Single-Cycle Masking in Hardware

    Get PDF
    Research on the design of masked cryptographic hardware circuits in the past has mostly focused on reducing area and randomness requirements. However, many embedded devices like smart cards and IoT nodes also need to meet certain performance criteria, which is why the latency of masked hardware circuits also represents an important metric for many practical applications. The root cause of latency in masked hardware circuits is the need for additional register stages that synchronize the propagation of shares. Otherwise, glitches would violate the basic assumptions of the used masking scheme. This issue can be addressed to some extent, e.g., by using lightweight cryptographic algorithms with low-degree Sboxes, however, many applications still require the usage of schemes with higher-degree S-boxes like AES. Several recent works have already proposed solutions that help reduce this latency yet they either come with noticeably increased area/randomness requirements, limitations on masking orders, or specific assumptions on the general architecture of the crypto core. In this work, we introduce a generic and efficient method for designing single-cycle glitch-resistant (higher-order) masked hardware of cryptographic S-boxes. We refer to this technique as (generic) Self-Synchronized Masking (“SESYM”). The main idea of our approach is to replace register stages with a partial dual-rail encoding of masked signals that ensures synchronization within the circuit. More concretely, we show that WDDL gates and Muller C-elements can be used in combination with standard masking schemes to design single-cycle S-box circuits that, especially in case of higher-degree S-boxes, have noticeably lower requirements in terms of area and online randomness. We apply our method to DOM-based S-boxes of Ascon and AES and compare the resulting circuits to existing latency optimized circuits based on TI, GLM, and LMDPL. The latency of all three designs is reduced to single-cycle operation and are dth-order secure. Compared to GLM-masked Ascon, our approach comes with a 6.4 times reduction in online randomness for all protection orders. Compared to 1st-order LMDPL-masked AES, our approach achieves comparable results, while it is more generic, amongst others, by also supporting higher-order designs. We also underline the practical protection of our constructions against power analysis attacks via empirical and formal verification approaches

    Changing of the Guards: a simple and efficient method for achieving uniformity in threshold sharing

    Get PDF
    Since they were first proposed as a countermeasure against differential power analysis (DPA) in 2006, threshold schemes have attracted a lot of attention from the community concentrating on cryptographic implementations. What makes threshold schemes so attractive from an academic point of view is that they come with an information-theoretic proof of resistance against a specific subset of side-channel attacks: first-order DPA. From an industrial point of view they are attractive as a careful threshold implementation forces adversaries to DPA of higher order, with all its problems such a noise amplification. A threshold scheme that offers the mentioned provable security must exhibit three properties: correctness, incompleteness and uniformity. A threshold scheme becomes more expensive with the number of shares that must be implemented and the required number of shares is lower bound by the algebraic degree of the function being shared plus 1. Defining a correct and incomplete sharing of a function of degree d in d+1 shares is straightforward. However, up to now there is no generic method to achieve uniformity and finding uniform sharings of degree-d functions with d+1 shares is an active research area. In this paper we present a simple and relatively cheap method to find a correct, incomplete and uniform d+1-share threshold scheme for any S-box layer consisting of degree-d invertible S-boxes. The uniformity is not implemented in the sharings of the individual S-boxes but rather at the S-box layer level by the use of feed-forward and some expansion of shares. When applied to the Keccak-p nonlinear step Chi, its cost is very small

    Efficient Low-Latency Masking of Ascon without Fresh Randomness

    Get PDF
    In this work, we present the first low-latency, second-order masked hardware implementation of Ascon that requires no fresh randomness using only d+1d+1 shares. Our results significantly outperform any publicly known second-order masked implementations of AES and Ascon in terms of combined area, latency and randomness requirements. Ascon is a family of lightweight authenticated encryption and hashing schemes selected by NIST for standardization. Ascon is tailored for small form factors. It requires less power and energy while attaining the same or even better performance than current NIST standards. We achieve the reduction of latency by rearranging the linear layers of the Ascon permutation in a round-based implementation. We provide an improved technique to achieve implementations without the need for fresh randomness. It is based on the concept of changing of the guards extended to the second-order case. Together with the reduction of latency, we need to consider a large set of additional conditions which we propose to solve using a SAT solver. We have formally verified both, our first- and second-order implementations of Ascon using CocoAlma for the first two rounds. Additionally, we have performed a leakage assessment using t-tests on all 12 rounds of the initial permutation. Finally, we provide a comparison of our second-order masked Ascon implementation with other results

    Tornado: Automatic Generation of Probing-Secure Masked Bitsliced Implementations

    Get PDF
    International audienceCryptographic implementations deployed in real world devices often aim at (provable) security against the powerful class of side-channel attacks while keeping reasonable performances. Last year at Asiacrypt, a new formal verification tool named tightPROVE was put forward to exactly determine whether a masked implementation is secure in the well-deployed probing security model for any given security order t. Also recently, a compiler named Usuba was proposed to automatically generate bitsliced implementations of cryptographic primitives.This paper goes one step further in the security and performances achievements with a new automatic tool named Tornado. In a nutshell, from the high-level description of a cryptographic primitive, Tornado produces a functionally equivalent bitsliced masked implementation at any desired order proven secure in the probing model, but additionally in the so-called register probing model which much better fits the reality of software implementations. This framework is obtained by the integration of Usuba with tightPROVE+, which extends tightPROVE with the ability to verify the security of implementations in the register probing model and to fix them with inserting refresh gadgets at carefully chosen locations accordingly.We demonstrate Tornado on the lightweight cryptographic primitives selected to the second round of the NIST competition and which somehow claimed to be masking friendly. It advantageously displays performances of the resulting masked implementations for several masking orders and prove their security in the register probing model

    Triplex: an Efficient and One-Pass Leakage-Resistant Mode of Operation

    Get PDF
    This paper introduces and analyzes Triplex, a leakage-resistant mode of operation based on Tweakable Block Ciphers (TBCs) with 2n-bit tweaks. Triplex enjoys beyond-birthday ciphertext integrity in the presence of encryption and decryption leakage in a liberal model where all intermediate computations are leaked in full and only two TBC calls operating a long-term secret are protected with implementationlevel countermeasures. It provides beyond-birthday confidentiality guarantees without leakage, and standard confidentiality guarantees with leakage for a single-pass mode embedding a re-keying process for the bulk of its computations (i.e., birthday confidentiality with encryption leakage under a bounded leakage assumption). Triplex improves leakage-resistant modes of operation relying on TBCs with n-bit tweaks when instantiated with large-tweak TBCs like Deoxys-TBC (a CAESAR competition laureate) or Skinny (used by the Romulus finalist of the NIST lightweight crypto competition). Its security guarantees are maintained in the multi-user setting

    Sophisticated security verification on routing repaired balanced cell-based dual-rail logic against side channel analysis

    Get PDF
    Conventional dual-rail precharge logic suffers from difficult implementations of dual-rail structure for obtaining strict compensation between the counterpart rails. As a light-weight and high-speed dual-rail style, balanced cell-based dual-rail logic (BCDL) uses synchronised compound gates with global precharge signal to provide high resistance against differential power or electromagnetic analyses. BCDL can be realised from generic field programmable gate array (FPGA) design flows with constraints. However, routings still exist as concerns because of the deficient flexibility on routing control, which unfavourably results in bias between complementary nets in security-sensitive parts. In this article, based on a routing repair technique, novel verifications towards routing effect are presented. An 8 bit simplified advanced encryption processing (AES)-co-processor is executed that is constructed on block random access memory (RAM)-based BCDL in Xilinx Virtex-5 FPGAs. Since imbalanced routing are major defects in BCDL, the authors can rule out other influences and fairly quantify the security variants. A series of asymptotic correlation electromagnetic (EM) analyses are launched towards a group of circuits with consecutive routing schemes to be able to verify routing impact on side channel analyses. After repairing the non-identical routings, Mutual information analyses are executed to further validate the concrete security increase obtained from identical routing pairs in BCDL

    Implémentations Sécurisées de Chiffrement par Bloc contre les Attaques Physiques

    Get PDF
    Since their introduction at the end of the 1990s, side-channel attacks are considered to be a major threat to cryptographic implementations. Higher-order masking is considered to be one the most popular existing protection strategies against such attacks. It consists in separating each internal variable in the cryptographic computation into several random variables. However, the use of this type of protection entails a considerable efficiency loss, making it unusable for industrial solutions.The goal of this thesis is to reduce the gap between theoretical solutions, proven secure, and efficient implementations that can be deployed on embedded systems. More precisely, I analyzed the protection of block ciphers such as the AES encryption scheme, where the main issue is to protect the s-boxes with minimal overhead in costs.I have tried, first, to find optimal mathematical representations in order to evaluate the s-boxes while minimizing the number of multiplications (an important parameter for masking schemes, but also for homomorphic encryption). For this purpose, I have defined a generic method to decompose any s-box on any finite field with a low multiplicative complexity. These representations can then be efficiently evaluated with higher-order masking. The flexibility of the decomposition technique further allows the developer to easily adapt it to its needs.Secondly, I have proposed a formal method for measuring the security of circuits evaluating masking schemes. This technique allows to define with exact precision whether an attack on a protected circuit is feasible or not. Unlike other tools, its computation time is not exponential in the circuit size, making it possible to obtain a security proof regardless of the masking order used. Furthermore, this method can strictly reduce the use of costly tools in randomness required for reinforcing the security of masking operations.Finally, I present some implementation results with optimizations at both algorithmic and programming levels. I particularly employ a bitslice implementation strategy for evaluating the s-boxes in parallel. This strategy leads to speed record for implementations protected at high orders. The different codes are developed and optimized in ARM assembly, one of the most popular programming language in embedded systems such as smart cards and mobile phones. These implementations are also available online for public use.Depuis leur introduction à la fin des années 1990, les attaques par canaux auxiliaires sont considérées comme une menace majeure contre les implémentations cryptographiques. Parmi les stratégies de protection existantes, une des plus utilisées est le masquage d'ordre supérieur. Elle consiste à séparer chaque variable interne du calcul cryptographique en plusieurs variables aléatoires. Néanmoins, l'utilisation de cette protection entraîne des pertes d'efficacité considérables, la rendant souvent impraticable pour des produits industriels.Cette thèse a pour objectif de réduire l'écart entre les solutions théoriques, prouvées sûres, et les implémentations efficaces déployables sur des systèmes embarqués. Plus particulièrement, nous nous intéressons à la protection des algorithmes de chiffrement par bloc tel que l'AES, dont l'enjeu principal revient à protéger les boîtes-s avec un surcoût minimal.Nous essayons tout d’abord de trouver des représentations mathématiques optimales pour l'évaluation des boîtes-s en minimisant le nombre de multiplications (un paramètre déterminant pour l'efficacité du masquage, mais aussi pour le chiffrement homomorphe). Pour cela, nous définissons une méthode générique pour décomposer n'importe quelle boîte-s sur un corps fini avec une complexité multiplicative faible. Ces représentations peuvent alors être évaluées efficacement avec du masquage d'ordre supérieur. La flexibilité de la méthode de décomposition permet également de l'ajuster facilement selon les nécessités du développeur.Nous proposons ensuite une méthode formelle pour déterminer la sécurité d'un circuit évaluant des schémas de masquages. Cette technique permet notamment de déterminer de manière exacte si une attaque est possible sur un circuit protégé ou non. Par rapport aux autres outils existants, son temps de réponse n'explose pas en la taille du circuit et permet d'obtenir une preuve de sécurité quelque soit l'ordre de masquage employé. De plus, elle permet de diminuer de manière stricte l'emploi d'outils coûteux en aléas, requis pour renforcer la sécurité des opérations de masquages.Enfin, nous présentons des résultats d'implémentation en proposant des optimisations tant sur le plan algorithmique que sur celui de la programmation. Nous utilisons notamment une stratégie d’implémentation bitslice pour évaluer les boîtes-s en parallèle. Cette stratégie nous permet d'atteindre des records de rapidité pour des implémentations d'ordres élevés. Les différents codes sont développés et optimisés en assembleur ARM, un des langages les plus répandus dans les systèmes embarqués tels que les cartes à puces et les téléphones mobiles. Ces implémentations sont, en outre, disponibles en ligne pour une utilisation publique
    corecore