11 research outputs found

    Leakage Assessment Methodology - a clear roadmap for side-channel evaluations

    Get PDF
    Evoked by the increasing need to integrate side-channel countermeasures into security-enabled commercial devices, evaluation labs are seeking a standard approach that enables a fast, reliable and robust evaluation of the side-channel vulnerability of the given products. To this end, standardization bodies such as NIST intend to establish a leakage assessment methodology fulfilling these demands. One of such proposals is the Welch\u27s t-test, which is being put forward by Cryptography Research Inc., and is able to relax the dependency between the evaluations and the device\u27s underlying architecture. In this work, we deeply study the theoretical background of the test\u27s different flavors, and present a roadmap which can be followed by the evaluation labs to efficiently and correctly conduct the tests. More precisely, we express a stable, robust and efficient way to perform the tests at higher orders. Further, we extend the test to multivariate settings, and provide details on how to efficiently and rapidly carry out such a multivariate higher-order test. Including a suggested methodology to collect the traces for these tests, we point out practical case studies where different types of t-tests can exhibit the leakage of supposedly secure designs

    Hardware architecture implemented on FPGA for protecting cryptographic keys against side-channel attacks

    Get PDF
    This paper presents a new hardware architecture designed for protecting the key of cryptographic algorithms against attacks by side-channel analysis (SCA). Unlike previous approaches already published, the fortress of the proposed architecture is based on revealing a false key. Such a false key is obtained when the leakage information, related to either the power consumption or the electromagnetic radiation (EM) emitted by the hardware device, is analysed by means of a classical statistical method. In fact, the trace of power consumption (or the EM) does not reveal any significant sign of protection in its behaviour or shape. Experimental results were obtained by using a Virtex 5 FPGA, on which a 128-bit version of the standard AES encryption algorithm was implemented. The architecture could easily be extrapolated to an ASIC device based on standard cell libraries. The system is capable of concealing the real key when various attacks are performed on the AES algorithm, using two statistical methods which are based on correlation, the Welch’s t-test and the difference of means.Peer ReviewedPostprint (author's final draft

    Higher-Order Threshold Implementation of the AES S-Box

    Get PDF
    In this paper we present a threshold implementation of the Advanced Encryption Standard’s S-box which is secure against first- and second-order power analysis attacks. This security guarantee holds even in the presence of glitches, and includes resistance against bivariate attacks. The design requires an area of 7849 Gate Equivalents and 126 bits of randomness per S-box execution. The implementation is tested on an FPGA platform and its security claim is supported by practical leakage detection tests

    WiP: Applicability of ISO Standard Side-Channel Leakage Tests to NIST Post-Quantum Cryptography

    Get PDF
    FIPS 140-3 is the main standard defining security requirements for cryptographic modules in U.S. and Canada; commercially viable hardware modules generally need to be compliant with it. The scope of FIPS 140-3 will also expand to the new NIST Post-Quantum Cryptography (PQC) standards when migration from older RSA and Elliptic Curve cryptography begins. FIPS 140-3 mandates the testing of the effectiveness of ``non-invasive attack mitigations\u27\u27, or side-channel attack countermeasures. At higher security levels 3 and 4, the FIPS 140-3 side-channel testing methods and metrics are expected to be those of ISO 17825, which is based on the older Test Vector Leakage Assessment (TVLA) methodology. We discuss how to apply ISO 17825 to hardware modules that implement lattice-based PQC standards for public-key cryptography -- Key Encapsulation Mechanisms (KEMs) and Digital Signatures. We find that simple ``random key\u27\u27 vs. ``fixed key\u27\u27 tests are unsatisfactory due to the close linkage between public and private components of PQC keypairs. While the general statistical testing approach and requirements can remain consistent with older public-key algorithms, a non-trivial challenge in creating ISO 17825 testing procedures for PQC is the careful design of test vector inputs so that only relevant Critical Security Parameter (CSP) leakage is captured in power, electromagnetic, and timing measurements

    How Does Strict Parallelism Affect Security? A Case Study on the Side-Channel Attacks against GPU-based Bitsliced AES Implementation

    Get PDF
    Parallel cryptographic implementations are generally considered to be more advantageous than their non-parallel counterparts in mitigating side-channel attacks because of their higher noise-level. So far as we know, the side-channel security of GPU-based cryptographic implementations have been studied in recent years, and those implementations then turn out to be susceptible to some side-channel attacks. Unfortunately, the target parallel implementations in their work do not achieve strict parallelism because of the occurrence of cached memory accesses or the use of conditional branches, so how strict parallelism affects the side-channel security of cryptographic implementations is still an open problem. In this work, we make a case study of the side-channel security of a GPU-based bitsliced AES implementation in terms of bit-level parallelism and thread-level parallelism in order to show the way that works to reduce the side-channel security of strict parallel implementations. We present GPU-based bitsliced AES implementation as the study case because (1) it achieves strict parallelism so as to be resistant to cache-based attacks and timing attacks; and (2) it achieves both bit-level parallelism and thread-level parallelism (a.k.a. task-level parallelism), which enables us to research from multiple perspectives. More specifically, we first set up our testbed and collect electro-magnetic (EM) traces with some special techniques. Then, the measured traces are analyzed in two granularity. In bit-level parallelism, we give a non-profiled leakage detection test before mounting attacks with our proposed bit-level fusion techniques like multi-bits feature-level fusion attacks (MBFFA) and multi-bits decision-level fusion attacks (MBDFA). In thread-level parallelism, a profiled leakage detection test is employed to extract some special information from multi-threads leakages, and with the help of those information our proposed multi-threads hybrid fusion attack (MTHFA) method takes effect. Last, we propose a simple metric to quantify the side-channel security of parallel cryptographic implementations. Our research shows that the secret key of our target implementation can be recovered with less cost than expected, which suggests that the side-channel security of parallel cryptographic implementations should be reevaluated before application

    Vectorizing Higher-Order Masking

    Get PDF
    International audienceThe cost of higher-order masking as a countermeasure against side-channel attacks is often considered too high for practical scenarios, as protected implementations become very slow. At Eurocrypt 2017, the bounded moment leakage model was proposed to study the (theoretical) security of parallel implementations of masking schemes [5]. Work at CHES 2017 then brought this to practice by considering an implementation of AES with 32 shares [26], bitsliced inside 32-bit registers of ARM Cortex-M processors. In this paper we show how the NEON vector instructions of larger ARM Cortex-A processors can be exploited to build much faster masked implementations of AES. Specifically, we present AES with 4 and 8 shares, which in theory provide security against 3rd and 7th-order attacks, respectively. The software is publicly available and optimized for the ARM Cortex-A8. We use refreshing and multiplication algorithms that are proven to be secure in the bounded moment leakage model and to be strongly non-interfering. Additionally, we perform a concrete side-channel evaluation on a BeagleBone Black, using a combination of test vector leakage assessment (TVLA), leakage certification tools and information-theoretic bounds

    A Thorough Evaluation of RAMBAM

    Get PDF
    The application of masking, widely regarded as the most robust and reliable countermeasure against Side-Channel Analysis (SCA) attacks, has been the subject of extensive research across a range of cryptographic algorithms, especially AES. However, the implementation cost associated with applying such a countermeasure can be significant and even in some scenarios infeasible due to considerations such as area and latency overheads, as well as the need for fresh randomness to ensure the security properties of the resulting design. Most of these overheads originate from the ability to maintain security in the presence of physical defaults such as glitches and transitions. Among several schemes with a trade-off between such overheads, RAMBAM, presented at CHES 2022, offers an ultra-low latency in terms of the number of clock cycles. It is dedicated to the AES and utilizes redundant representations of the finite field elements to enhance protection against both passive and active physical attacks. In this paper, we have a deeper look at this technique and provide a comprehensive analysis. The original authors reported that the number of required traces to mount a successful attack increases exponentially with the size of the redundant representation. We however examine their scheme from theoretical point of view. More specifically, we investigate the relationship between RAMBAM and the well-established Boolean masking and, based on this, prove the insecurity of RAMBAM. Through the examples and use cases, we assess the leakage of the scheme in practice and use verification tools to demonstrate that RAMBAM does not necessarily offer adequate protection against SCA attacks neither in theory nor in practice. Confirmed by real-world experiments, we additionally highlight that -- if no dedicated facility is incorporated -- the RAMBAM designs are susceptible to fault-injection attacks despite providing some degree of protection against a sophisticated attack vector, i.e., SIFA

    Low-Latency Hardware Private Circuits

    Get PDF
    Over the last years, the rise of the IoT, and the connection of mobile - and hence physically accessible - devices, immensely enhanced the demand for fast and secure hardware implementations of cryptographic algorithms which offer thorough protection against SCA attacks. Among a variety of proposed countermeasures against SCA, masking has transpired to be a promising candidate, attracting significant attention in both, academia and industry. Here, abstract adversary models have been derived, aiming to accurately model real-world attack scenarios, while being sufficiently simple to enable formally proving the SCA resilience of masked implementations on an algorithmic level. In the context of hardware implementations, the robust probing model has become highly relevant for proving SCA resilience due to its capability to model physical defaults like glitches and data transitions. As constructing a correct and secure masked variant of large and complex circuits is a challenging task, a new line of research has recently emerged, aiming to design small, masked subcircuits - realizing for instance a simple AND gate - which still guarantee security when composed to a larger circuit. Although several designs realizing such composable subcircuits - commonly referred to as gadgets - have been proposed, negligible research was conducted in order to find trade-offs between different overhead metrics, like randomness requirement, latency, and area consumption. In this work, we present HPC3, a hardware gadget which is trivially composable under the notion of PINI in the glitch-extended robust probing model. HPC3 realizes a two-input AND gate in one clock cycle which is generalized for any arbitrary security order. Existing state-of-the-art PINI gadgets either require a latency of two clock cycles or are limited to first-order security. In short, HPC3 enables the designer to trade double the randomness for half the latency compared to existing gadgets, providing high flexibility and enabling the designer to gain significantly more speed in real-time applications

    Custom Instruction Support for Modular Defense against Side-channel and Fault Attacks

    Get PDF
    International audienceThe design of software countermeasures against active and passive adversaries is a challenging problem that has been addressed by many authors in recent years. The proposed solutions adopt a theoretical foundation (such as a leakage model) but often do not offer concrete reference implementations to validate the foundation. Contributing to the experimental dimension of this body of work, we propose a customized processor called SKIVA that supports experiments with the design of countermeasures against a broad range of implementation attacks. Based on bitslice programming and recent advances in the literature, SKIVA offers a flexible and modular combination of countermeasures against power-based and timing-based side-channel leakage and fault injection. Multiple configurations of side-channel protection and fault protection enable the programmer to select the desired number of shares and the desired redundancy level for each slice. Recurring and security-sensitive operations are supported in hardware through custom instruction-set extensions. The new instructions support bitslicing, secret-share generation, redundant logic computation, and fault detection. We demonstrate and analyze multiple versions of AES from a side-channel analysis and a fault-injection perspective, in addition to providing a detailed performance evaluation of the protected designs. To our knowledge, this is the first validated end-to-end implementation of a modular bitslice-oriented countermeasure
    corecore