22 research outputs found

    Pushing the speed limit of constant-time discrete Gaussian sampling. A case study on Falcon.

    Get PDF
    Sampling from discrete Gaussian distribution has applications in lattice-based post-quantum cryptography. Several efficient solutions have been proposed in recent years. However, making a Gaussian sampler secure against timing attacks turned out to be a challenging research problem. In this work, we observed an important property of the input random bit strings that generate samples in Knuth-Yao sampling. We delineate a generic step-by-step method to instantiate a discrete Gaussian sampler of arbitrary standard deviation and precision by efficiently minimizing the Boolean expressions by exploiting this prop- erty. Discrete Gaussian samplers generated in this method can be up to 37% faster than the state of the art method. Finally, we show that the signing algorithm of post-quantum signature scheme Falcon using our constant-time sampler is at most 33% slower than the fastest non-constant time sampler

    Constant-time discrete Gaussian sampling

    Get PDF
    © 2018 IEEE. Sampling from a discrete Gaussian distribution is an indispensable part of lattice-based cryptography. Several recent works have shown that the timing leakage from a non-constant-time implementation of the discrete Gaussian sampling algorithm could be exploited to recover the secret. In this paper, we propose a constant-time implementation of the Knuth-Yao random walk algorithm for performing constant-time discrete Gaussian sampling. Since the random walk is dictated by a set of input random bits, we can express the generated sample as a function of the input random bits. Hence, our constant-time implementation expresses the unique mapping of the input random-bits to the output sample-bits as a Boolean expression of the random-bits. We use bit-slicing to generate multiple samples in batches and thus increase the throughput of our constant-time sampling manifold. Our experiments on an Intel i7-Broadwell processor show that our method can be as much as 2.4 times faster than the constant-time implementation of cumulative distribution table based sampling and consumes exponentially less memory than the Knuth-Yao algorithm with shuffling for a similar level of security

    Lightweight Design Choices for LED-like Block Ciphers

    Get PDF
    Serial matrices are a preferred choice for building diffusion layers of lightweight block ciphers as one just needs to implement the last row of such a matrix. In this work we analyze a new class of serial matrices which are the lightest possible 4×44 \times 4 serial matrix that can be used to build diffusion layers. With this new matrix we show that block ciphers like LED can be implemented with a reduced area in hardware designs, though it has to be cycled for more iterations. Further, we suggest the usage of an alternative S-box to the standard S-box used in LED with similar cryptographic robustness, albeit having lesser area footprint. Finally, we combine these ideas in an end-end FPGA based prototype of LED. We show that with these optimizations, there is a reduction of 1616% in area footprint of one round implementation of LED

    Finding Three-Subset Division Property for Ciphers with Complex Linear Layers (Full Version)

    Get PDF
    Conventional bit-based division property (CBDP) and bit- based division property using three subsets (BDPT) introduced by Todo et al. at FSE 2016 are the most effective techniques for finding integral characteristics of symmetric ciphers. At ASIACRYPT 2019, Wang et al. proposed the idea of modeling the propagation of BDPT, and recently Liu et al. described a model set method that characterized the BDPT propagation. However, the linear layers of the block ciphers which are analyzed using the above two methods of BDPT propagation are restricted to simple bit permutation. Thus the feasibility of the MILP method of BDPT propagation to analyze ciphers with complex linear layers is not settled. In this paper, we focus on constructing an automatic search algorithm that can accurately characterize BDPT propagation for ciphers with complex linear layers. We first introduce BDPT propagation rule for the binary diffusion layer and model that propagation in MILP efficiently. The solutions to these inequalities are exact BDPT trails of the binary diffusion layer. Next, we propose a new algorithm that models Key-Xor operation in BDPT based on MILP technique. Based on these ideas, we construct an automatic search algorithm that accurately characterizes the BDPT propagation and we prove the correctness of our search algorithm. We demonstrate our model for the block ciphers with non-binary diffusion layers by decomposing the non-binary linear layer trivially by the COPY and XOR operations. Therefore, we apply our method to search integral distinguishers based on BDPT of SIMON, SIMON(102), PRINCE, MANTIS, PRIDE, and KLEIN block ciphers. For PRINCE and MANTIS, we find (2 + 2) and (3 + 3) round integral distinguishers respectively which are longest to date. We also improve the previous best integral distinguishers of PRIDE and KLEIN. For SIMON, SIMON(102), the integral distinguishers found by our method are consistent with the existing longest distinguishers

    The QARMA Block Cipher Family. Almost MDS Matrices Over Rings With Zero Divisors, Nearly Symmetric Even-Mansour Constructions With Non-Involutory Central Rounds, and Search Heuristics for Low-Latency S-Boxes

    Get PDF
    This paper introduces QARMA, a new family of lightweight tweakable block ciphers targeted at applications such as memory encryption, the generation of very short tags for hardware-assisted prevention of software exploitation, and the construction of keyed hash functions. QARMA is inspired by reflection ciphers such as PRINCE, to which it adds a tweaking input, and MANTIS. However, QARMA differs from previous reflector constructions in that it is a three-round Even-Mansour scheme instead of a FX-construction, and its middle permutation is non-involutory and keyed. We introduce and analyse a family of Almost MDS matrices defined over a ring with zero divisors that allows us to encode rotations in its operation while maintaining the minimal latency associated to {0, 1}-matrices. The purpose of all these design choices is to harden the cipher against various classes of attacks. We also describe new S-Box search heuristics aimed at minimising the critical path. QARMA exists in 64- and 128-bit block sizes, where block and tweak size are equal, and keys are twice as long as the blocks. We argue that QARMA provides sufficient security margins within the constraints determined by the mentioned applications, while still achieving best-in-class latency. Implementation results on a state-of-the art manufacturing process are reported. Finally, we propose a technique to extend the length of the tweak by using, for instance, a universal hash function, which can also be used to strengthen the security of QARMA

    Truncated Differential Attacks: New Insights and 10-round Attacks on QARMA

    Get PDF
    Truncated differential attacks were introduced by Knudsen in 1994 [1]. They are a well-known family that has arguably received less attention than some other variants of differential attacks. This paper gives some new insight on truncated differential attacks and provides the best-known attacks on both variants of the lightweight cipher QARMA, in the single tweak model, reaching for the first time 10 rounds while contradicting the security claims of this reduced version. These attacks use some new truncated distinguishers as well as some evolved key-recovery techniques

    A Faster Software Implementation of the Supersingular Isogeny Diffie-Hellman Key Exchange Protocol

    Get PDF
    Since its introduction by Jao and De Feo in 2011, the supersingular isogeny Diffie-Hellman (SIDH) key exchange protocol has positioned itself as a promising candidate for post-quantum cryptography. One salient feature of the SIDH protocol is that it requires exceptionally short key sizes. However, the latency associated to SIDH is higher than the ones reported for other post-quantum cryptosystem proposals. Aiming to accelerate the SIDH runtime performance, we present in this work several algorithmic optimizations targeting both elliptic-curve and field arithmetic operations. We introduce in the context of the SIDH protocol a more efficient approach for calculating the elliptic curve operation P + [k]Q. Our strategy achieves a factor 1.4 speedup compared with the popular variable-three-point ladder algorithm regularly used in the SIDH shared secret phase. Moreover, profiting from pre-computation techniques our algorithm yields a factor 1.7 acceleration for the computation of this operation in the SIDH key generation phase. We also present an optimized evaluation of the point tripling formula, and discuss several algorithmic and implementation techniques that lead to faster field arithmetic computations. A software implementation of the above improvements on an Intel Skylake Core i7-6700 processor gives a factor 1.33 speedup against the state-of-the-art software implementation of the SIDH protocol reported by Costello-Longa-Naehrig in CRYPTO 2016

    Compact-LWE: Enabling Practically Lightweight Public Key Encryption for Leveled IoT Device Authentication

    Get PDF
    Leveled authentication allows resource-constrained IoT devices to be authenticated at different strength levels according to the particular types of communication. To achieve efficient leveled authentication, we propose a lightweight public key encryption scheme that can produce very short ciphertexts without sacrificing its security. The security of our scheme is based on the Learning With Secretly Scaled Errors in Dense Lattice (referred to as Compact-LWE) problem. We prove the hardness of Compact-LWE by reducing Learning With Errors (LWE) to Compact-LWE. However, unlike LWE, even if the closest vector problem (CVP) in lattices can be solved, Compact-LWE is still hard, due to the high density of lattices constructed from Compact-LWE samples and the relatively longer error vectors. By using a lattice-based attack tool, we verify that the attacks, which are successful on LWE instantly, cannot succeed on Compact-LWE, even for a small dimension parameter like n=13n=13, hence allowing small dimensions for short ciphertexts. On the Contiki operating system for IoT, we have implemented our scheme, with which a leveled Needham-Schroeder-Lowe public key authentication protocol is implemented. On a small IoT device with 8MHZ MSP430 16-bit processor and 10KB RAM, our experiment shows that our scheme can complete 50 encryptions and 500 decryptions per second at a security level above 128 bits, with a public key of 2368 bits, generating 176-bit ciphertexts for 16-bit messages. With two small IoT devices communicating over IEEE 802.15.4 and 6LoWPAN, the total time of completing an authentication varies from 640ms (the 1st authentication level) to 8373ms (the 16th authentication level), in which the execution of our encryption scheme takes only a very small faction from 46ms to 445ms
    corecore